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ABSTRACT 

This thesis reports the design, conducting, and results of an experi- 
ment intended to measure the paging rate of a virtual memory computer 
system as a function of paging memory size. This experiment, conducted on 
the Multics computer system at M.I.T., a large interactive computer utility 
serving an academic community, sought to predict paging rates for paging 
memory sizes larger than the existent memory at the time. A trace of all 
secondary memory references for two days was accumulated, and simulation 
techniques applicable to "stack" type paging algorithms (of which the 
least-recently-used discipline used by Multics is one) were applied to it. 

A technique for interfacing such an experiment to an operative com- 
puter utility in such a way that adequate data can be gathered reliably 
and without degrading system performance is described. Issues of dynamic 
page deletion and creation are dealt with, apparently for the first re- 
ported time. The successful p^fo^pance^o^J^li^ the 
viability of performing this type of '"measu rement j^ '&J& Afj^pf system. 
The results of.t|ie v exf^:fc|s)nt ( jw demand 
paging behavior. - ..'■' V' .^, *_ • '"*„; _'■-.. ■" ^* 1_.'.^'.^:-X 
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Introduction 

1. Brief Statement of the Problem 

In this thesis, we describe and report the results of an experiment 
designed to predict the performance of automatically managed multilevel 
memory systems for a previously unexplored range of primary memory sizes. 

2. Summary of Result 

We have developed techniques for predicting memory system perfor- 
mance on an operative computer utility, utilizing: an automatically man- 
aged multilevel virtual memory. Based upo^es^a^ished theoretical tech- 
niques, we have developed techniques -to ^extract, the necess«ry data from 
a computer utility functioning under a live load. In doing so> we con- 
sidered problems of dynamic creation and deletion of pages which appa<- 
rently have not been dealt wit& pr%v4oue<ly. Th^ 
niques was demonstrated by performing .eeverel meesu^ements. 

Using these techniques, we have found that, on the measured system, 
the rate of accesses to data outside of primary memory decreased drasti- 

a 

cally as primary memory size is increased above 2 x 10 bits (6 million 
36 -bit words, or 24 megabytes). We have found that the mean time be- 
tween these accesses, as a function of primary memory size was best ap- 
proximated by a function of at least the second order, and possibly ex- 
ponential. Previous research on the system under consideration showed 

Q 

a linear function to hold for primary memory size up to 1.3 x 10 bits 
(4 million 36-bit words, or 16 megabytes) (SI). Although these results 
do not attempt to characterize Multics, we believe that they are rea- 



sonably representative of the observed class of user behavior. 

3. Summary of the Work of this Thesis 

By means of an experiment on the Unities computer system (B2), 
running on the Honeywell 645 at M.I.T., we have arrived at measurements 
of the predicted reference rates to secondary memory for hypothetical 
extensions of primary memory. These measurements were made on an actual 
user load, the M.I.T. community, and not any sort of benchmark or test 
load. From these measurements, models of program behavior in IRU*-man- 
aged storage hierarchies can be derived. We suggest here one such model. 

The essential technique for deriving these predictions from such 
measurements is known in the literature (CI, C2) as the "extension* prob- 
lem" . It is based upon the properties of a class of memory management 
algorithms known as "stack algorithms" (Ml), which include URU. Using 
these properties, we were able to simulate the operation of the LRU al- 
gorithm for larger primary memory sizes than the actual one present for 
the identical user load. The input to this simulation was a history of 
all references to data outside of primary memory, specifically, on disk, 
during the period of measurement. It is a property of the stack al- 
gorithms that one measurement and simulation can be used to predict se- 
condary memory reference rates for all primary memory sizes . 

The work reported in this thesis is significant because it is both 
the first measurement of this type on a paged, segmented, raultiprogrammed 
computer system which has been reported, and an extension of our range of 



*LRU, for Least Recently Used - a memory management policy whereby the 
least recently used data is moved to slower memory when space is needed 
in faster memory. 
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knowledge of the so-called "headway" function which we have described 
above. Previous measurements of this function (SI) involved other tech- 
niques, and only investigated it for primary memory sixes of up to 1.3 x 

8 a 

10 bits. Our measurements explored regions approaching 4 x 10 bits. 

Although there is no inherent limit on the range which could in principle 

be explored by our techniques, the limitation of our explorations is due 

only to the noteworthy fact that over a day's running of the experiment, 

no more than 4 x 10 bits of information were referenced more than once 



by the M.I.T. community. 

The significance of the actual resulting measurement is twofold: 
First, it provides an example of typical behavior for the measured sys- 
tem. Second, it suggests more general models of program behavior. 

4. Structure of this Thesis 

Chapter 1 discusses the concepts of paging and virtual memory. We 
provide justification for the types o£ statistics and models we seek and 
describe how to use them in performance predictions. We discuss previous 
research in this area, and provide a more detailed statement of the 
novelty of this thesis. 

Chapter 2 describes the experiment. We describe- the relevant fea- 
trues of the so-called "stack" algorithms 0U), and tie, extension prob- 
lem. We discuss the problems of adapting this type of experiment to the 
multilevel memory system of Multics. We describe the difficulties in 
performing this experiment on an operating computer utility, and the 
solutions we adopt. 

Chapter 3 gives the results of the experiment. The results are pre- 
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seated graphically, and we suggest their interpretation. We analyze 
these results, and provide a detailed error analysis. 

Chapter 4 is a summary of the work done. We suggest future direc- 
tions for research, and pose some of the questions left unanswered by 
this thesis. 

There are three appendices. '' 

Appendix A is an extremely detailed description of the Multics paging 
control algorithm, as it was at the time of the experiment. We describe 
it on several levels, allowing comprehension by the reader on whichever 
one he chooses. This background is useful for full comprehension of cer- 
tain design decisions in the planning of the experiment. It is alsoi the 
first publication of this algorithm at this level of detail (Corbatfi? (C4) 
provides a less detailed discussion). ' 

Appendix B describes how the actual events of Multics memory manage- 
ment were mapped into the idealized events of theoretical interest to the 
experiment. We describe the modifications and the interface to the Mul- 
tics supervisor necessary for this experiment . We assume that the reader 
has some comprehension of the previous appendix. 

Appendix C is a graphical presentation of user load, idle time,! and 
paging overhead on the Multics system on the days of the experiment. ; 
These figures were derived from routine metering performed by the admin- 
istration of the M.I.T. Information Processing Center. 
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Chapter 1 



1.1 Memory Performance Prediction as a Qoal 

As digital computer systems have increased in size and complexity 
since their inception almost twenty years ago, so have the memory archi- 
tectures required to support increasingly advanced applications and sys- 
tems. What is more, progress in. memory technology has created a plethora 
of memory media, ranging over a wide gamut of costs, speeds, and pro- 
perties. The desire for increased throughput, and in real-time systems, 
the desire for quick response, create a need for t;be fastest memory tech- 
nology available. The fastest media, however, are. almost always the most 
expensive on a cost-per-bit basis. Thus, for a given computer system to 
achieve or approach desired goals of memory. access speed within a given 
economic constraint, it becomes useful for memory systems consisting of 
varying amounts of mixed memory technologies to be used in one installa- 
tion. 

Most computers of the past twenty years have used magnetic core as 
their main, or primary memory. That is to say, the processor was capable 
of fetching data and instructions only irom core memory. Further memory 
demands were met by the use of tapes, disks, and other bulk. media, whose 
contents could be transferred in or out of selected areas of primary 
memory by explicit program request. Most of the programs and operating 
systems designed for this type of architecture allocated these areas for 
input/output transfers in fixed, specific regions of primary memory. 
When programs could not fit in their entirety in primary memory, they 
were divided into independent pieces, or overlays , which were transferred 
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in and out of primary memory essentially at their own discretion. 

In the last few years, a strategy known as virtual memory has achieved 
popularity. With this scheme, programs are allowed, effectively, to re- 
ference data or instructions in primary memory or on any secondary storage 
device in an identical manner, creating the impression of a very large, 
or in some cases, conceptually infinite primary memory. References to 
secondary memory cause software intervention, signalled by specialized 
hardware, which results in selected code or data fragments being read into 
primary memory. Clearly, this implies replacement of some other code or 
data currently in primary memory, and in order to facilitate this task, 
such systems divide all primary and secondary storage into equal-sized 
areas, called blocks , or page frames . Information in the system is di- 
vided into pages, which may reside in various page frames at various times. 
This implementation of virtual memory is thus known as demand paging , as 
pages are read in on demand, i.e., when referenced. The selection of 
appropriate pages in primary memory for replacement is a critical issue, 
and is still a basis for much further study. 

A page fault, as the software-assisted fetch of a page not in pri- 
mary memory is called, represents lost time. The time required to ac- 
cess and transfer the copy of the page on secondary storage is time during 
which the requesting program may not run. The time that a processor must 
spend in page fault software, deciding on an appropriate page to replace, 
is a system overhead, which does not contribute to the progress of users' 
programs. Multiprogramming, a scheme almost universally used on medium 
and large scale systems, allows processors to serve one user's program 
while another's is suspended, say for a page fault. But even here, most 
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systems limit the degree (number of simultaneously runnable users) of 
multiprogramming, and page faults can lead to a situation where a pro- 
cessor spends an undesirably large fraction of its time sitting idle, 
accomplishing no function at all. Furthermore, the primary memory space 
occupied by all of the faulting program is unusable by any program for the 
duration of the transfer. Thus, the minimization of page faults in a 
virtual memory system is extremely desirable. It is an important function 
of the page-replacement algorithm, as the procedure which selects pages 
for replacement at page- fault time is known, to attempt to minimize the 
number of page faults in the forseeable future. These decisions are usu- 
ally made with information gleaned from observation of page usage in the 
immediate past, occasional knowledge of predicted page usage patterns, 
and some general models of program behavior. 

Many page-replacement algorithms have thus been designed for virtual 
memory systems with the explicit objective of minimizing page faults . 
These algorithms are subject to mathematical analysis, which is not true 
of arbitrary user programs. Hence, by careful observation of the storage 
references made by a program or multiprogrammed collection of programs 
(although the latter clearly requires some further remarks) We can ana- 
lyze its interaction with any given page-replacement algorithm running 
in any given size of primary memory, and ascertain which page faults 
would or would not have occured had primary memory been some other size. 
These techniques are not in general applicable to non-virtual memory sys- 
tems, for many programs have no idea of how large a memory they are 
running in, or how to take advantage of it, and thus explicitly-requested 
data transfers are not affected by changing memory size in any inter- 
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esting or easily analyzable way. 

The ability to determine page fault rates (page faults per unit time) 
for different memory sizes is a powerful tool in both performance analysis 
and memory system engineering. Sekino (S2) has shown the explicit depen- 
dence of response time and throughput in multiprogrammed systems on the 
mean headway between page faults (MHBPF). This quantity describes the 
mean amount of useful work done by user programs between each two page 
faults. It is most conveniently measured in total references to the vir- 
tual memory. If the mean amount of system overhead associated with a 
page fault is known, as well as a proper characterization of system idle 
time, we may compute MHBPF from the mean real time between page faults. 
(MTBPF) and the processor reference rate. Hence, predictive techniques 
to obtain page fault rates for contemplated memory sizes can be used to 
deduce the system throughput and response time figurea which would result. 
Hence, if one can indeed predict these figures, the economic tradeoffs 
involved in acquiring improved memory system performance by increasing 
primary memory size may be evaluated more methodically. 

The use of more than one type of secondary memory in a single system 
results in a situation where the average time to access a data item in 
any part of the storage system is a function of both the average access 
time to a data item in each unit and the probability of accessing that 
unit. In a demand paging system, the probability of accessing each unit 
is the sum of the probabilities of accessing each page stored on it. If 
one can associate these probabilities with given pages of such a system, 
one can create a composite memory system with an optimal average access 
time within any given cost constraint. Ramamoorthy and Chandy (Rl) have 



16 

given an algorithm, whereby such a system may be constructed out of any 
collection of memory types, whose speed ami cost-per-bit characteristics 
are known. In any case, it is clear that one should keep the pages with 
the highest reference probability on the fastest storage devices. Al- 
though the identities of these pages may be determined by experimentation, 
observation, and program analysis, one can view these probabilities and/ or 
identities as functions of time. Thus, one can devise algorithms which 
attempt to maintain pages with given ranges of next-reference probabilities 
on appropriate storage devices. It should be fairly apparent that this 
problem is identical to that of maintaining pages in primary memory with 
the intent of minimizing page faults. This will be discussed more later 
on. Thus, the design of an optimal multilevel storage system, as such 
configurations are known, can also be analysed by the techniques of pri- 
mary memory paging analysis. Again, the assumption of an appropriate 
model of. program behavior, both in general and for the particular system 
at hand, is of crucial importance. 
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1.2 Program Reference Patterns and Models 

Computer programs being among the most deterministic of all things, 
any characterization of the data reference patterns of any particular pro- 
gram may be obtained by the simulated running of that program and the 
observation of whatever characterizations are desired. However, system 
engineering requires characterizations of programs which are to be run, 
which, for the most part, have not yet been written. In a given computer 
system, running under a given operation system, most running programs 
have many features of their memory usage patterns in common. For instance, 
in an operating system where an Algol-60 or PL/I- like run-time stack is 
native to the environment, the pages containing the top of the stack will 
always have a higher next-reference probability then page representing 
lower regions. If the supervisor itself is paged, i.e., running in a 
virtual memory, the same as users' programs, the supervisor has its own 
reference patterns which will be present in any run of the system. The 
same is true of compilers, assemblers, system utilities, library routines, 
and other service programs. Code generated by the same compiler is likely 
to produce certain common features in its reference patters, particularly 
on a local level. Thus, there is great value in observing typical be- 
havior of programs in a large computer system, and trying to formulate 
some model which is in some sense average or typical. 

In a multiprogrammed system, this averaging is done for us in real 
time. An experimental observation of program behavior in a multipro- 
grammed computer system, made over some reasonable period of time, say a 
day, will produce a characterization of typical system behavior, if one 
indeed believes that such exists. This characterization takes into con- 
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sideration all of the programs run in that day, and relies on the cotbkhi 
features of programs discussed above to have any validity at all. The 
interval of a day is chosen as reasonable, for, that is the cycle time of 
many forms of human interaction with a computer. People deal with an 
interactive computer system for several days, doing the same type of work 
at similar hours in the day. 

The particular model of reference behavior that we seek describes 
next-reference probability of pages in a virtual memory system as a func- 
tion of position in a certain dynamic ordering* known as a stack, Imposed 
by the page-replacement 'algorithm. The class of algorithms amenable to 
this analysis are precisely those which would keep the top n pages of this 
order ing in an n-page pr imary memory , were it used fco manage such . This 
will be discussed more fully in section 2.1. What is important here is 
that we can arrive at a function p(x), where p is the probability of re- 
ference to position x in this ordering. It is the object of that sub-class 
of these page-replacement algorithms which are actually useful for memory 
management to make this function monotopiically decreasing. If the al- 
gorithm actually succeeds at this, it is clear that then pages which are 
most likely to be referenced will probably be in the n-page primary memory, 
and thus , the page-replacement algorithm has succeeded in minimizing re- 
ferences outside of the n-page primary memory, or page faults. In the 
case of multilevel memories, we can pick out whatever positions in the 
ordering are appropriate, by Ramamoorthy and Chandy's algorithm, and as- 
sign them to whatever storage unit is required* C. K. Chow (C3) has also 
given an algorithm where an optimal multilevel memory system within a cost 
constraint may be constructed directly from the, function p(x). 
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1.3 The Experimental Determination of Predicted Headways 

If we accept the function p(x) as a valid characterization of average 
and typical behavior in a multiprogrammed system, we may predict page 
fault headways for hypothetical memory extensions from it. Furthermore, 
the function p(x) may be measured experimentally. In this section, we show 
how to approximate and use p(x) in this way. 

x is the position of a page in the algorithm- imposed ordering we have 
been discussing. Assume we have constructed the necessary tools to mea- 
sure r(x), where r(x) is the number of times a page in position x of the 
ordering was referenced. Assuming pages which were never touched to be in 
position "infinity" of the ordering, then the relative frequency of 
touching a page in position x is 

f(x) = r(x) 

2 r(t) (1) 

t = 1 

Here, the numerator is the count of references to position x, and the 
denominator is the total number of references to all positions. If p(x) 
is indeed a valid characterization, f(x) should approximate p(x). 

We have stated, that for the class of algorithms under consideration, 
the first k positions of this ordering at any time contain precisely those 
pages which would be in a primary memory of size k. Hence, references to 
pages in the first k positions of the ordering never cause a page fault 
in a k-page primary memory, and references to pages in any position beyond 
k always cause page faults. Hence, if a program makes H references to 
the virtual memory, the number of page faults it will take in the course 
of those references is identically the total number of references made 



^Vk- ^rS^WgS^h^*^*: 
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to positions in the ordering beyond the primary memory size. Thus, the 
program, running in a It-page primary memory, will produce a mean headway 

MHBPF(k) = H 

2 r(t) (2) 

t - k + 1 

This relation holds true for any primary memory size k. If we have an 

actual system, running in a primary memory of size n, we can predict the 

MHBPF which would result on this system were memory extended to size E, 

E being greater than n. We assume that we can measure MHBPF (n) on the 

existing system, and that a tool for measuring r(t), for t>n, is available. 

Then the same program which takes H virtual memory references will have a 

MHBPF in the E page memory of 

MHBPF(E) = H 

2 r(t) (3) 

t - E + 1 
We now divide equation (3) by equation (2), obtaining 

2 

MHBPF (E) = t = n+1 r(t) 
MHBPF (n) « 

tiE+i^v < 4 > 

Observe that this equation allows us to predict MHBPF from a mea- 
sured MHBPF and measured reference counts,, a fact which will be used 
later. We now rewrite equation (1) to read 

r(t) = f(t) 2 r(u) (5) 

U *5 1 

letting t be what was x and u be what was t. Substituting (5) in (4), 
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replacing r(t), we obtain 



00 

I 



f(t) E r(u) E r(u) E f(t) 
u = 1 

MHBPF (E ) = t = n+1 = u = 1 t = n+1 

MHBPF(n) m 

f(t) E r(u) E r(u) E f(t) 
u = 1 u = 1 t = +1 

t = E+l 



00 



E f(t) 
= t = n+1 

00 

E f(t) 
t = E+l (6) 

Multiplying both sides by MHBPF (n), we obtain 

§ f(t) 
MHBPF (E) = MHBPF (n) t = n+1 

2 f(t) 
t = E+l (7) 

This equation states that mean headway between page faults which would 
result from a memory extension to E pages may be computed from the mea- 
sured mean headway between page faults on the unextended memory, and a 
factor which is a function only of the program or programs being run and 
the memory sizes concerned. The work of our thesis is to compute this 
factor. 



; ,^s&if*&^'3>>~-?x.--^ ■■ \ ;-- ;---i^- -:■■ 



22 

1.4 Previous Work in this Ar ea 

Since the advent of virtual memory computer systems, the function 
MHBPF(x) has been of great interest, being an easily identifiable charac- 
terization of memory system performance. Investigators have run many pro- 
grams in simulation, obtaining this mean headway as a function of memory 
size experimentally. Almost all of these experiments have been done on 
machines which attempt to 'compress 1 a program into a. smaller space than 
that in which it was intended to run. Such systems may typically attempt 
to fit five or ten programs, each running in a 32 k virtual memory into a 
core memory of 96 to 150 k. In such instances, the set of pages referenced 
by each program is small, as is the potential set which it can reference. 
These sets of pages are usually disjoint, as they represent disjoint 
virtual memories. Virtual memory in this case is simply a technique to 
force several programs into a primary memory too small to contain all of 
them. 

Such work has been reported by Be lady (fcl)> Belady and Kuehner (B3), 
and Fine et al. (Fl>, among others. A large amount of this work was done 
on an IBM M44/44X, a 7040 type machine at IBH Research Labs adapted to 

demand paging. Belady and Kuehner report an expected HBPF for single 

2 
programs running on this system of the general form e = a n , n being 

primary memory size. 

Brawn and Gustavson (B4) performed some measurements of typical com- 
putational programs running on the same M44/44X. These measurements were 
significant as they are apparently the first reported measurements of 
programs specifically written for a virtual memory. They observed the 
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running time of programs, including page fault overhead, as a function 

of real memory size. No analytic models were suggested. 

Some performance analysis done by Schwartz (S3) on a Burroughs model 
6700 is also of interest here. In this system, all data available to a 
program is referenced as variable-size segments, brought into core on a 
demand basis. Program code and certain data segments are shared^ and the 
amount of information potentially accessible to a program is extremely 
large. He reported headway functions of the form e = exp (a.n), variables 
the same as above, for mis sing- segment exceptions as memory size was 
varied. (These were actual measurements performed on various memory con- 
figurations.) 

The research which directly led to this thesis was done by Saltzer, 
and later by Saltzer, Webber, and Snyder (SI). Saltzer measured the MHBPF 
on the Multics system (B2) at M.I.T., with two different sizes of configured 
memory. He obtained the result e = a«n, which has since been called the 
'linear paging model 1 . Saltzer later reported the results of an experiment 
designed and conducted by Webber and Snyder, In which the reorderings of the 
list by which the Multics paging drum is maintained were observed. Using 
the techniques described in 1.2 above, MHBPF(n) was extrapolated to a memory 
size of 4000 pages (each Multics page is 1024 words by 36 bits), and was 
found to be still within experimental error of the linear paging model. 
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1.5 Novelty of the Work in this Thesis 

The work performed in this thesis was originally conceived as an 
extension to Saltzer and Webber's experiment, elucidating the nature of 
MHBPF(n) for n greater than 4000 pages. The limitations of the linear 
model were sought, as was the nature of whatever model held beyond that 
range . 

This series of experiments oti the Hultics system is unique for several 
reasons. The data accessible to any program in Mult tea is potentially 
the entire storage system, and all data accesses are made v^ta the r virtual* 
memory mechanism. This is similar to the burroughs scheme, but dissimilar 
to the paged 'compressing' type systems described above. Furthermore, 
sharing is an extr em ely important consideration fn Mult ics, as ail pro- 
gram code, including the supervisor, is shared. This thesis Is also 
apparently the first reported attempt to deal with dynamically variable 
virtual memories, i.e., those whose size grows and shrinks on a second- to- 
second basis. The issues of dynamic page creation and destruction which 
result from this policy are systematically deUlt with by bur experiment. 

The use of virtual memory seems to be gaining in popularity as large 
general-purpose information systems become more common. Increased interest 
in systematic protection schemes has resulted in many new designs for 
systems having segmented addressing features similar to those found in 
Hultics. Demand paging has achieved considerably more popularity and 
widespread use than the Burroughs techniques as an implementation for 
segmentation, and has recently been added by IBM to their extremely popu- 
lar System/370. 
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For these reasons, we feel that experiments made on a Multics-like 

system are relevant to data systems in the near future, and the reference 

patterns observed may have some features which are in some sense 

characteristic of programs running in segmented, paged, environments. 
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Chapter 2 

2.1 Stack Algorithms and the Extension Problem 

The substance of the experiment performed was to reconstruct the en- 
tire history of a day's IAU-maintenance of the Multics storage hierarchy, 
and attempt to predict page- fault headways for hypothetical memory con- 
figurations from this history. 

The basic strategy of memory simulation used was that proposed by 
Mattson et al. (Ml). This technique, known as stack simulation, relies 
on the fact that a large number of useful paging algorithms, including 
IHU, have the property that after any fixed number of addresses in an ad- 
dress trace have been processed by the algorithm, the pages which are left 
in primary memory are always a subset of what they would have been at the 
same point in the trace had primary memory been larger. This feature, 
known as the "inclusion property", thus defines the class of "stack 
algorithms". From this property, at any given point in the processing of 
an address trace an ordering can be constructed. The first page in this 
ordering would be that page which would then be in primary memory were it 
of single-page capacity, the second would be that page which would also 
be in primary memory were it of two-page siee, the third that which would 
be added were memory of three-page size, and so forth. The history of 
the processing of an address trace can be viewed as a series of these 
order ings, which are known as "stacks", the single page corresponding to 
unit-size memory normally being considered the "top". As each new re- 
ference is processed, the algorithm causes the stack to be reordered, 
possibly corresponding to page motion for some size memory. The top n 



27 

pages on the stack being the pages which would be in a memory of n-page 
capacity, any motion of a page into the top n-pages implies a physical 
reading of a page into primary memory. For a demand stack algorithm, this 
movement can occur only as the result of a page fault. Thus, we may infer 
the behavior of an n-page primary memory by observing the number of times 
that reference is made to position n+1 or beyond in such a stack. As we 
have defined this stack and the class of algorithms processing it as main- 
taining the first n pages in this stack in an n-page memory, no reference 
to position n or below can ever cause a page fault. Mattson's technique 
consists of taking a recorded or proposed address trace, running it 
through a program which constructs the sequence of stacks just described, 
and accumulates the total number of references >t© each position therein* 
When the processing of the trace begins, the stack is void, corresponding 
to an empty primary memory. At least until a given page is fetched* into 
primary memory the first time, it will not have been in the stack at all, 
and its first fetch may be considered to have been made from position 
"infinity". As the trace progresses, and repeated references to pages 
are made, we accumulate counts for each position in the stack of how many 
times a page in that position was moved upward by the algorithm. It can 
be shown that for a demand stack algorithm, the only condition on which 
a page may move upward in the stack is that it is that page which has 
just been referenced. Simply, were this not the case, a page in position 
n would move into an n-1 page primary memory without having been refer- 
enced, and the algorithm would not be a demand paging algorithm. As the 
completion of the address trace, we can, for any n, sum the reference 
counts for positions n+1 to the tbtal final length of the stack, plus the 
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count for position "infinity", and this will be the number of page faults 
which would have transpired had that address trace been managed by the 
algorithm used in an a-page primary memory. Note that a single processing 
of the trace can be used to produce a result which can then be used to 
analyze any hypothetical memory size. 

This technique allows us to ascertain the page fault count for the- 
interval under consideration for any contemplated memory size. By ; simply 
dividing the total system headway during the actual trace by this page- 
fault count, we may thus ascertain the predicted mean time between page 
faults (MHBPF) for that storage system. ^urthermpre # we can plot the re- 
ference counts at each position, normalized with Respect to the total 
number of. reference counts, versus the position number, and, obtain a graph 
which describes what we shall call access frequencies. With this, we can 
analyze the behavior of multilevel memory systems processing this trace, 
and obtain an optimal such system within cost constraints as described in 
Chapter 1. The shape of this graph also tells us much about the relative 
success of the particular algorithm in managing that particular, address 
trace, without regard to any single memory configuration. We will con- 
sider the particular graph in the case of our results in greater detail 
in the next chapter, and in so doing further consider such graphs in 
general. 

Our experiment sought to learn the shape and nature of this graph at 
positions corresponding to memory sizes of many thousands of pages. In 
order to record a reference to position n in a stack as described, there 
must clearly be n-1 items above it. This implies that at least n dis- 
tinct items have been referenced by the time a reference t° the n posi- 
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tion occurs. It can be seen that the extent of the address trace required 
to produce meaningful statistics at the ten- thousandth page position would 
require a prodigious address trace. At this point advantage may be taken 
of another remarkable property of stack algorithms. It is possible to con- 
struct the portion of the stack from position n+1 to the end without a full 
address trace — we use information extracted from a running algorithm 
managing an n-page primary memory at times when page faults occur. This 
is known as the "extension problem" (C1,C2). The technique is as follows: 
we maintain the stack (the "extension stack") for positions n+1 and be- 
yond. When a page fault occurs, we know that the page faulted on cannot 
be in the first n positions of the stack -- if so, it would not have been 
faulted on. We locate the page in the extension stack; if not there, we 
may consider it as having been at position "infinity". The counter cor- 
responding to the position from which the page was fetched is incremented. 
We remove the page in question from the extension stack: it is now in the 
top position of the real stack, which we are not maintaining. We now use 
whatever information is necessary, from that normally obtainable to the 
running algorithm, plus that we are maintaining, to reorder the extension 
stack according to the policy of the running algorithm. This reordering 
will usually include placing some page removed from primary memory by 
the running algorithm at some point in the extension stack. In the case 
where the replacement algorithm is LRU, the page removed from primary 
memory is placed on top of the extension stack, and all pages previously 
in the extension stack move down one location. Note that pages which 
were below the fetched page in the extension stack stay in place during 
the entire transaction. 
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i 

The advantages of using a trace of a running algorithm in a large sys- 
tem measured over an extended period of time, as Opposed to a trace ob- 
tained by simulation of a given program over e necessarily much shorter 
period of time are straightforward. We are interested in system perfor- 
mance on an hour-to-hour, not second- to- second, basis, and day-long mea- 
surements of a live system correspond to both the time scale and load mix 
of interest. As long as the accuracy of the measurement can be maintained, 
this day-long extension measurement is much more useful than the simulated 
running of a program. 

As a demonstration of the power of this extension technique, we may 
consider the Multics system: 400,000 references to the virtual memory oc- 
cur every second. Approximately 100 page faults, occur each second. Re- 
cording 2 data items for each page fault, we have reduced the amount 
of data which must be recorded by a factor of two thousand. 

It should be clear that the results of this experiment, although simu- 
lating hypothetical memory system performance, do not represent simulated 
results. The measuremeBts made correspond to an uncontrolled user popu- 
lation during normal working days, using arbitrary programs under 6-way 
multiprogramming. The results thus show bow a hypothetical memory sys* 
tem would have behaved under this real user load. 
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2.2 The Extension Problem and Multics 

The Multics system has a physical memory consisting of 256K (1 K = 

1024 36-bit words = 3.7 x 10 4 bits) to 384 K words (1.2 x 10 7 bits) of 

g 
core, a 2000 to 4000 K word (1.5 x 10 bits) drum, and approximately 

9 
90,000 K words (3.3 x 10 bits) of disk, both moving and fixed head. The 

variabilities stated above are dependent upon the time of day and the user 

load, governed by administrative policy. The entire storage system is 

divided into 1024-word pages, and is managed by the demand paging mechanism 

(with the exception of several thousand words of non-pageable code and 

data, such as the code for the paging mechanism itself , which must be 

non-pageable in any case, and are thus not realty of interest in memory 

performance prediction). The algorithm used to manage replacement of 

pages in the core memory is essentially LRU* The variation from LRU is 

explained in detail in Appendix A. Essentially, within the constraints 

of operating system overhead and the precision of measurement of recency 

of use provided by the hardware, it tries to implement LRU as closely as 

possible. Also, a non-demand prepage/postpurge policy was in effect 

during these measurements, which caused some pages to move in and out of 

core outside of the control of the LRU algorithm. 

The 4000- page paging drum was at this time being used in a mode 

which attempted to overcome rotational latency by making multiple copies 

(S4), in this case two, and hence was of 2000 page capacity during these 

experiments. Since January, 1972, the drum has been used as part of a 

hierarchically managed storage system, as a buffer between core and the 

disk storage subsystem. In such systems, one attempts to keep pages 

with the highest access frequencies on the fastest devices, in order to 
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minimize the system's mean access time. In an automatically managed sys- 
tem, the identity of those pages is constantly changing. As the stack 
access frequency graph discussed above can be used to associate access 
frequency with stack position, page replacement algorithms identical to 
those used to manage primary memory are frequently used to manage other 
devices in a hierarchical memory system to; achieve this end. In the Mul- 
tics system, another near-UtU algorithm is used to manage the drum, which 
is described as well in the Appendix. Hie drum management algorithm at- 
tempts to maintain copies of the top 2000- pages of the theoretical stack 
corresponding to the 1X0 algorithm on the drum. Ike model of program be- 
havior implied by the UU7 algorithm, and verified by the results of this 
experiment, implies that these pages are the most likely to be referenced, 
and at the time they are on the drum, thus have the highest access fre- 
quencies. 

As currently implemented, a page which has been faulted on, and is 
not on the drum, is read into core from the disk. It will not be written 
to the drum until the core management algorithm decides to oust it from 
core. This implies that the pages corresponding ite that portion of the 
UR.U stack representing core are not completely a subset of those on the 
drum. Hence the drum will contain pages representing a 2000-page con- 
tiguous portion of the stack, whose topmost extreme is anywhere between 
the top of the stack and the size of core below it. Of the 256 to 384K, 
about 100K is not used for paging, leaving 150 to 2«0K for paging, thus 
this variability represents about 7 te 15% of the size of the drum. 

The stack-reordering procedure of the IMS algorithm is one of the 
simplest possible: the referenced page moves to the top position of the 
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stack, as is necessary in any demand stack algorithm. The pages between 
the top page and the old position of the referenced page all move down one 
position. Thus, to use the extension technique described above for the 
LRU algorithm, the only reordering information one need record is the 
identity of the page thrown out of position p (p being the size of the non- 
extended memory, in pages), which will then occupy position p+1, at each 
page fault, in addition to the identity of the page faulted on. The 
"pushed" page becomes the top of the extension stack, the page previously 
there becomes #2, etcetera, all the way down to the former position of the 
faulted-on page. This is what we have done with the Multics core-drum 
combination, considering it as a 2000+X page buffer, where X is some frac- 
tion of the size of core, itself at most fifteen percent of the drum, to 
account for the top-of-drum variability described. As positions p+1 and 
on, in our case, correspond to the disk subsystems, we need only record 
disk reads instead of page faults. (It is instructive to note that within 
the entire operation of a Multics system, not a single direct-access I/O 
transfer is done outside the paging mechanism, pre-paging being included 
in this consideration.) As disk reads are two to five per second, we 
have thus reduced our data-gathering chore by at least 95%. The experi- 
ment of Saltzer, Webber, and Snyder, which was similar in intent to this 
experiment, but more limited in scope, has already produced results (Si) 
for primary memory sizes up to the maximum size of the drum. For this 
reason, we did not consider it worthwhile to attempt to gather data for 
that portion of the stack corresponding to regions in the drum. Hence, 
the application of the extension technique to this core-drum combination 
was adequate. All else that was needed was the recording of information 
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provided by the drum management algorithm as to the identity of pages 
thrown off the drum. It can be seen that these are the pages thrown out 
of the core-drum combination, given that no page is ever thrown out of 
core without having been written to the drum* 

In order to determine the validity of this technique, the necessary 
programs were written and tested on a stand-along Multics machine. This 
machine had 131,072 words of core and a 256 -page drum. Hence, most of the 
range of the regular Multics drum was in the extension region of this ex- 
periment. The mean headway curve resulting was very well approximated by 
a straight line, suggesting the linear paging model. This provided a 
good deal of confidence in both the technique and the software. 

Hence, we see two types of motion between core-drum and disk. The 
reading of a page constitutes motion from disk into core-drum. The 
writing of a page, however, does not constitute outward motion. In gene- 
ral, writing is performed only when a copy of a page on disk is different 
from a drum or core copy. The outward motion corresponding to a read is 
really the claiming of the core or drum frame previously occupied by the 
page of interest. We call this phenomenom an "ousting". 

Unfortunately, a problem arises with even this simple model. Certain 
pages of the storage system, 3 in all, corresponding to the system's top- 
level directory, are special-cased by the paging and drum-management al- 
gorithms such that they may never go on the drum. This is due to certain 
integrity issues involving the reliability of the drum and the extreme 
difficulty in reconstructing the contents of this directory. Hence, 
these pages are never written to the drum, and leave the "core" portion 
of the increasingly less theoretical 1X0 stack directly for the disk por- 
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tion. (In this case, writing never takes place unless the concerned page 
has actually been modified (see Appendix A)). More unfortunately, these 
are among the most popular pages on the disk, as by the dictates of the 
LRU algorithm, they should have by all means been on the drum. Thus, we 
must check for pages being ousted directly from core to disk, and we thus 
have two varieties (core and drum) of ousting to be recorded. The inter- 
pretation of the data resulting from movement of these pages will be de- 
ferred until Chapter 3 . 

Thus, we need record upward stack movement into the core-drum com- 
bination, meaning disk reads, and downward movement, meaning oustings. 
Another event of interest is the creation and deletion of pages. In the 
current implementation of Multics, logical pages are created out of the 
void when a never before referenced page is referenced. By definition, 
all such pages contain zeros, and hence never involve disk reading. Fur- 
thermore, these page faults will occur regardless of what size primary 
memory is, and are thus not of interest in memory performance prediction. 
This last statement is somewhat subject to current design and user beha- 
vior. Were there a tremendous amount of fast, cheap primary memory, it 
is altogether possible that users would rarely delete programs or data, 
but simply rewrite or modify them, thus making page creation a much rarer 
event. We choose to ignore this possibility. 

In the following discussion, "n" represents the size of "primary 
memory", in pages, in terms of the extension problem. In terms of the 
specific experiment on Multics, n is the size of the core-drum subsystem 
in pages. As was explained earlier, this is the size of the drum (2000 
pages) plus a fraction of the size of core. 
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Page faults which cause creation of pages involve neither disk traffic, 
idle time, nor multiprogramming, and are thus not of interest in MTBPF cal- 
culations. Since all new pages are created in this way, we find them natu- 
rally falling past position n of the complete LRU stack, into the extension 
stack after sufficiently long disuse. Page deletion, on the other hand, 
can occur at any time in the life of a page. If a page is destroyed. so 
soon after its creation that it has never passed position n in the stack, 
we are oblivious to its entire existence. If, however, it is destroyed 
at such time that it is beyond position n* its destruction must be accom- 
panied by its excision from the extension stack. When a page is destroyed 
in core or on the drum, the next page to be faulted on replaces it without 
any page being pushed down the LRU stack. However, the position in the 
stack of the destroyed page is assumed by the page directly under it. The 
page fault following a page destruction creates only. upward stack motion — 
nothing is pushed down. 

Consider, in our theoretical n-page primary memory system, a page in 
position n of the LRU stack. This page is now destroyed. A page in posi- 
tion n+m is now faulted on. In an actual memory system, this page will 
now be read into primary memory without any page being replaced,, the des- 
troyed page having created an empty page frame, but the newly faulted^on 
page will be at the top of the LRU stack. The formerly first to n-lst 
pages now become the second to nth pages in the stack. The n+lst (first 
page not in core) to n+m^lst pages retain their original stack position. 
(See figure 2.1). The n+m'th position in the new stack is in a situa- 
tion akin to that of the nth position after the deletion of the page 
there: the page in the n+mflst position cannot come up to fill the void — 
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that would not be demand paging. Until some reference is made to a further 
position, say n+m+k, this will be the case. At the time the n+m+k refer- 
ence is made, position n+m+k now becomes anomalous. Hence, we create an 
anomaly when a page is deleted, which propagates down the stack as any re- 
ference is made beyond it. 

The strategy that we have chosen to deal with this, in the simulation, 
is simply to excise a page from the stack when it is deleted. Thus, any 
reference to a position beyond the excision will be tallied as a reference 
to position x instead of x+-l. Note, however, that the position of the 
excision has then moved to x. All references in front of excisions are 
tallied correctly. The analysis of the inaccuracies resulting from this 
treatment is quite involved, and is covered in detail in section 3.4.3. 

Thus, the data items which must be recorded in a trace are those re- 
presenting 1) reading of pages into memory, by demand or prepaging, from 
disk, 2) claiming of pages by ousting pages from drum to disk or from 
core to disk, and 3) the deletion of pages from the storage system. Of 
these events, types 1 and 3 represent excision of a page from the exten- 
sion stack, while type 2 represents the pushing of a page on to the top 
of the extension stack. Events (1) also cause the noting of the stack 
position of the page read, and the incrementing of a counter corresponding 
to that position. There are actually some other events which must be re- 
corded in the case of the Multics system, but these are due to the parti- 
cular implementation of the core and drum management algorithms, and are 
discussed in Appendix B. 

The handling of page reads of pages which cannot be found in the 
stack, i.e., their first reference, requires some thought as to inter- 
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pretation. Were this experiment run for a sufficiently long time, the 
appearance of such pages would cease. Pages which are created come down 
on the top of the stack, and any existent page which has not been refer- 
enced since the experiment began enters the core-drum extension- stack com- 
bination once and never leaves, until it is destroyed. These first refer- 
ences, as discussed, are counted in the "infinity" position of the stack. 
These fetches of pages not in the stack accounted for roughly a tenth 
(7881/74530) of all disk page fetches. These references do not affect the 
relative number of fetches to any two extension stack positions, as they 
would not be in a core-drum memory of any size until the first time they 
were referenced. Thus, when one considers disk accesses, one should con- 
sider these reads to be disk references in a core<-druni system of any size. 
However, the longer the experiment runs, the fewer will these references 
become. Thus, since we are interested in steady-state behavior, we have 
chosen to consider these reads a start-up transient, and not count them 
in any calculation. They tell only of the length of the experiment, not 
of what is being measured. 



40 

2.3 Performing an Experiment on Multics 

Having developed the theoretical bases of the extension problem, and 
adapted it to Mult ics, the next step was to proceed to construct the 
necessary software to develop the extension address trace, collect it, 
and perform the LRU stack simulation with it. 

A privileged-access facility was set up in the Multics hardcore super- 
visor specifically for this experiment. When enabled, a trace of all of 
the events mentioned above was accumulated in a circular 1008-word buffer. 
Each trace item included the physical device address of some page being 
read, ousted from core-drum, or deleted, information as to which of these 
events is represented, and a flag indicating, for statistical purposes, 
whether or not it was one of the previously mentioned pages, which are not 
allowed to go on the drum. Also recorded was information allowing the 
program which inspects this buffer to synchronize itself with it cor- 
rectly. A program was developed which inspected this buffer regularly — 
from the Multics standpoint, a privileged operation. Ibis program as- 
sembled the buffer images into a continuous trace, which could be as long 
as necessary, suitable for further, repeated processing. 

This strategy was decided upon because of the extensive time required 
to search an LRU stack for a given page, and the large amount of space re- 
quired to store this stack. This ruled out the possibility of having a 
special-purpose module of the Multics supervisor perform the experiment 
in real time. The performance degradation necessitated by the time re- 
quired to search and the space, which would have had to been non-pageable, 
to store the LRU stack would have been wholly unacceptable. Furthermore, 
the accumulating of the trace data for further processing allows many pro- 
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grams and versions of programs to be run on this data, increasing both its 
usefulness and the accuracy of the results obtained from it. 

One disadvantage of the data collection strategy described is the pos- 
sibility of the data collecting program losing synchronism with the circu- 
lar trace buffer, i.e., data being overwritten by new data be- 
fore it has been duly noted. This situation can come about when the data 
collecting program has made a decision about how often the buffer should 
be sampled, and an intense unexpected burst of activity causes the buffer 
to be written into significantly faster than before. The data-gathering 
program samples the buffer again, and notices that data has been lost, 
but anticipating further loss reschedules itself. Another way that data 
can be lost from buffer mis-synchronization is the data-gathering pro- 
cess falling behind in the multiprogramming queue due to Multics sched- 
uling policies and heavy user load. The implementation of the data- 
gathering program tried to compensate for this by being written as a multi- 
process program, i.e., a program running in a coordinated way in many pro- 
cesses at once. Not only did this give it a scheduling advantage, but in- 
creased the reliability of the data-gathering operation as a whole. 

Unfortunately, data losses of the types described were common, espe- 
cially in initial, developmental runs of this software. The greatest 
losses would typically occur at midnight, when a large number of user pro- 
grams scheduled to run then would, creating heavy paging activity refer- 
encing pages neither in drum nor core, and only one or two processes would 
be supporting the data-gathering operation. The extents and analysis of 
these losses are considered in the next chapter. 

A danger of running a large complex data-gathering system in many pro- 
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cesses is that of creating a great deal of activity which would bias the 
result of the measurements by measuring itself. The sharing features of 
the Hultics system helped counterbalance this effect: all of the data 
bases and procedures of the data-gathering system were fully shared, having 
only one copy. Only the per-process work areas were not shared. The 
actual data-gathering system, in order to handle the control of multiple 
processes, possibly multiple terminals* and dynamic scheduling was, in 
fact, quite complex, requiring six separate procedures. The shared data 
bases and procedures totalled ten pages. Approximately two pages of work 
area per process were needed. 

The data gathered was stored in data segments in the Multics virtual 
memory. The stack simulation was subsequently performed, using this data 
as the extension address trace, exactly as described above. The procedures 
which performed this reduction ran in an unrestricted Moltics environment, 
and hence had practically no restriction on time or space. The UtO stack 
was represented as a list, in which each node represented a stack posi- 
tion. A push of a page onto the top of the stack 'required the allocation 
of a new node, and the redefinition of this node as the top of the stack. 
This node was then made to point to the former stack top* The: excision of 
a page from the stack required locating the node corresponding to this page 
(each node contained a physical page address), the reallocation of this 
node, and the reconnecting of the list around it. For trace data repre- 
senting disk reads, however, it was necessary to ascertain the position in 
the list of the relevant page. This required a Search of the entire list. 
In order to reduce the work of discovering that a page was not in the list 
at all, a bit table was constructed, describing, for each possible physical 
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disk address, whether or not it was in the list at all. This saved the 
necessity of walking the entire list. 

The above list -maintenance algorithm spends a great deal of time 
searching the list to determine the position of pages in it. Several al- 
gorithms were considered to avoid the seemingly crude strategy of linear 
search, but most of these algorithms caused the list to grow increasingly 
disorganized, requiring periodic time consuming re-organizations, or re- 
quired large amounts of data movement, a poor approach in a paged system. 
Because of the availability of a stand-alone machine which could easily 
provide the computer time necessary to perform this processing, the develop- 
ment of a better list-maintenance algorithm was not pursued further. 

The result of the stack simulation was a table, describing for each 
position in the extension stack, how many times a page in that position 
had been referenced. The sum of all of these counts, plus those at posi- 
tion "infinity", represented the total number of all page fetches from 
disk during the period of the measurement. Although a graphical display 
of this information is of some interest, the calculation of MHBPF was the 
immediate objective. Thus, a table was created displaying, versus exten- 
sion stack position, the total number of fetches observed divided by the 
sum of the counts for all of the positions further down the stack. For 
a given position N, the interpretation of this number, x, is as follows: 
had memory (core/drum) been extended N pages above its actual size, we 
would make one disk reference under that circumstance for every x refer- 
ences we make now. We thus refer to x as 'references per exception'. 
Note that we have not included the "infinity" fetches in the 'total re- 
ference 1 count in the actual results shown here, for the reasons dis- 
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cussed in section 2.2. 

One may choose to interpret x as the "relative increase in mean head- 
way", which is to say, the factor by which mean headway will Increase over 
its current value if the extension of N pages is made to core-drum. For 
instance, if "references per exception" were 5 at H .=.- 4000 pages, the in- 
terpretation would he: If we added another 4000 pages to the, drum, we 
would fault to the disk one- fifth as often as we do now., This consti- 
tutes, in one sense, the raw data result of the experiment. Now, 



References observed Mean %%m between; measured references 

References beyond position N 



= References observed - Tfrlfft 'ftff*MP^ ,rft* iffPffr^Pfft 

References beyond position M References observed 



= Time duration of experiment. 
References byond position N 

= Mean time between references beyond position N 

= Expected mean time between references were memory extended by N 

Multiplying this number by the measured system headway in virtual 
memory references during the experiment, and dividing by the time dura- 
tion of the experiment, we obtain the expected mean headway between page 
faults were memory extended by N. 

We have displayed both references per exception, versus memory exten- 
sion size, and predicted inter-reference headway, as a function of memory 
size. 

Note that in all of this discussion, mean time computed from an ex- 
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periment taking many hours must be taken quite literally. The averaging 
effect over a day of usage varying between a heavy user load and solely 
the data-gathering program* produces a result which is really applicable 
to neither, but only a theoretical load somewhere in between. For this 
reason, we feel that 'references per exception* is a more useful inter- 
pretation of the results of this experiment than 'mean time between ex- 
ceptions '. Attempting to tune a system to the theoretical point described 
by such measurements will not help the system when it needs the most help. 

The reference counts and references per exception were subsequently 
displayed in printed tabular form, and the references per exception ver- 
sus stack position plotted on a Stromberg-Carlson 4020 Microfilm Recorder. 
Some of these results are reproduced and discussed in detail in the next 
chapter . 



*Some more precise descriptions of the exact user load during the experi- 
ment are provided in Appendix C. 



€# 



4* 



gcjys:ift"i! 0£ 



;: lfe 









Mti 















■- 

f. 



: 



='•-• 
.".'... 



fl&Ioa baa 6ac* "isay v^tgf « 4H&p5t$4 



Llsxsail »3jifii^ai6i s^ 3&SS £:i£8d vim® s^I^J Iftsraiiaq 

.IssS am » copses: 

5p -SttldJtfS'HI 







»i3s*rx3 9fi3 snliufc bsol 73 air 3 seats sda io 






47 

3.2 The Results of the Experiment 

The form that we have chosen to display is that of an "exception 
ratio", or MHBPF(E)/MHBPF(n), where n is the adjusted 'primary' memory 
size (core-drum) as explained in section 2.2, and E is a hypothetical 
memory, both in pages. This exception ratio is the quantity expressed by 
equation 1.6. We plot this ratio versus primary memory extension in 
figure 3.1. We express the abscissa ox" our graph as 'memory extension', 
which is the hypothetical increase of core-drum instead of absolute memory 
size because of the variability of the size of core-drum as discussed in 
Chapter 2. The size of core-drum is not the sum of sizes of core and 
drum, because of duplications, created pages in core which have not been 
copied out to drum, and possibly even different configured sizes <Jf core. 
The extension size to core-drum is meaningful however, because thift data 
and results derived from the measured data represent the behavior of a 
hypothetical extension of the given size, oblivious to all of the above 
considerations. If a figure for the size of core-drum is needed, 2100 
pages is reasonable. The shape of this graph suggests an exponential be- 
havior. Thus, we next plot this ratio on a logarithmic vertical axis, to 
better view this behavior. This is figure 3.2. The plot almost traverses 
the graph diagonally, suggesting the straight line which would correspond 
to an exponential. We have drawn a straight-line approximation, which 
corresponds to 

(E-n)/(7.00 x 10 7 bits) 
MHBPF(E)/MHBPF(n) = 3.42e (1) 

The surprising closeness of the dtm 21 and dtm 23 plota gives some con- 
fidence in this result. A similarity to the unpublished Burroughs re- 
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suit (S3) mentioned in Chapter 1 nay be noted, subject to the limitations 
discussed there. This similarity goes only as far as that Schwartz no- 
ticed an exponential mean-headway function for missing data in primary 
memory as memory size was increased. 

It can be seen that the low regions, i.e., below 2000 pages above the 
drum, fall short of the expaaentlal approximation suggested. Thus, we pro- 
vide figure 3.3, in which we display the low region of figure 3.1. This 
3>Jot shows a decidedly less than exponential behavior. From one view- 
point, this is comforting, as the experiment of Seltzer, Webber, and Sny- 
der (SI) measured this function in this range, and obtained a linear func- 
tion. However, our plot seems to grow considerably faster than linearly 
in this region. This can be seen as a noticeably different change in be- 
havior* ., ;: 

An attempt to resolve these contradictions Is provided by figure 3.4* 
in which both memory extension and exception ratio are plotted logarithm- ,• ' 
ically. It is seen that the higher regions of this plot approach Quad- 
ratic slope, and the increasing slope lends more confidence to the ex- 
ponential suggested above. However, no uniform quadratic curve suggests 
itself. 

The most that can be said is that WOPr(E)/MttPr<n) is s function of 
at least second order, as X-n exceeds 3000 pages. 

We slso provide figure 3.5, in which we plot MBPF(I) directly as a 
function of memory extension, by multiplying the ordinate of figure 3.1 by 
the measured JfflEKF on both measur emen ts ,- »©t« that e~g*venr exception ratio 
does not correspond to the sane MHUPF on both measurements, as different 
mean headways were observed. This is due to the variability of the system 
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load on these days. The difference, however, is not very significant. 
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the polynomial headway function MHBPF(x) = ax giving in general 

t k 
P, (x) = 



k w " k+1 (12) 



x 



which subsumes (8), (10), and (11). The exponential model (9) is the only 
one of these probability distributions which is characterized by an inde- 
pendent parameter, y. Letting X = 1/y, we rewrite (9) as 

. N 1 -(x-n)A /ion 

p(x) = ^ e (13) 

\ has the dimensions of pages. It in some sense characterizes a 'radius 
of locality of reference 1 of the programs running. It is the mean fetch 
depth into the extension stack. 
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3.4 Accuracy of the Reported Results 

The question of accuracy of the results of a supposedly deterministic 
simulation seems at first to be unnecessary. However, this simulation was 
based upon a measurement. Thus, the techniques used to interface this ex- 
periment to the Multics system (see Appendix B) became a source of inac- 
curacy. Furthermore, the behavior of anomalous pages (the so-called "glo- 
bal transparent paging device" pages) caused significant deviation from 
the assumed LRU model. The deletion of pages in LRU list created prob- 
lems, as an inordinate amount of effort would have been required to handle 
these correctly (see Chapter 2). 

Thus, we will consider three sources of inaccuracy: lost data, glo- 
bal transparent paging device pages, and list deletions. 
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3.4.1 The Effect of Lost Data 

The loss of data was due to failure to retrieve it from the Multics 
supervisor before it was overwritten. This was a consequence of the cir- 
cular buffer strategy chosen to solve the problem of real-time storage of 
this data. These strategies were discussed in detail in section 2.3. 

The effect of these losses are twofold: some counters for stack posi- 
tions were not incremented for lost data, and the ordering of the stack 
was affected by this lost data. We consider these problems separately. 
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3.4.1.1 Lost Counter Accuracy 

In order to deal with either of these problems, we assume that lost 
data has no correlation to page reference patterns. We thus deduce that 
it shares the same distribution over stack position as the successfully 
accumulated data, and the shape of the resulting histogram is not se- 
verely affected by this loss. For the measurement "dtm 23", the most suc- 
cessful and accurate of thos made, 435 trace items were lost out of a 
total of about 200,000 successfully recorded items. This represents a 
total inaccuracy in counting of less than one quarter of a percent. For 
the slightly less accurate "dtm 21", 1200 items were lost at various times, 
Measured against the 150,000 items successfully collected here, this is 
still less than one percent. 
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3.4.1.2 Stack Shifting Inaccuracies 

This type of inaccuracy, resulting from inaccurate reconstruction of 
the UlU extension stack, is considerably more subtle, and damaging in its 
effect. Failure to notice certain movement into and out of the extension 
stack causes the stack to stray progressively farther from a realistic re- 
construction. Items remain in the simulated extension stack which were 
in fact removed by lost data, and items which should have been pushed on 
its top are not so pushed. The result of items not being removed cor- 
rectly is twofold: first, the item will appear twice in the stack when it 
is pushed out legitimately onto the top of the extension stack in the fu- 
ture, and items further down the stack than the false appearance will have 
their stack position incorrectly recorded. The double appearance is only 
a problem because of the latter effect. The stack-managing algorithm of 
the simulation program used a bit table to record the known presence of 
every storage system page in the extension stack. Thus, the legitimate 
pushing (the second time) of such a page has no effect, and the later 
fetching of that page from the extension stack fetches the correct in- 
stance, and the bit table indicates the page as no longer being in the 
stack. 

The result of items not being pushed because of failure to record 
their pushing is similar. Their absence at the top of the extension stack 
causes all items below them, which is the entire stack, to have their posi- 
tions incorrectly recorded. These items appear one position higher than 
they should be, for each missing item. Thus, until the missing item is 
later requested from the stack, by virtue of a recorded fetch, all items 
which were on the stack before the failure to place the missing item will 
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have their positions incorrectly tallied*. Also, a later reference £o the 
unpushed page will be recorded as a transient, not- in- stack page fetch, as 
discussed in section 2.2. The effect of not accounting this fetch to any 
stack position was already discussed. 

Note that the effect of pushing and then fetching any single page has 
no effect on the extension stack orderings before the pushing and after the 
fetching. That is to say, if the pushing and fetching were not recorded 
at all, only the stack orderings between the two would be incorrect. Thus, 
if at any one time a data retrieval from the hardcore supervisor notices 
that X data items were lost, all such push- fetch pairs within the X lost 
data items have no stack-reordering inaccuracy associated with them, and 
only the lost-counter inaccuracy occurs. A reference-push pair, on the 
other hand, causes both types of inaccuracy. Although it seems evident, by 
locality of reference, that any string of contiguous lost data items must 
contain a large number of push- fetch pairs, i.e., a page recently pushed 
out of core-drum is one of the most likely to be fetched back in soon, a 
more careful mathematical analysis shows this to be false. Based upon 
parameters derived from the data accurately recorded in "dtm 23", 106 page 
fetches within a string of contiguous lost data items will statistically 
include only 4 fetches of pages pushed within the lost data items. 

We must thus assume the worst case, that every lost data item was in 
fact a push or a fetch not properly matched within the lost data. Hence, 
for the 435 lost trace items in "dtm 23", the effect of losing this data 
could not have been worse than the pushing of 435 never-referenced-again 
pages on the top of the extension stack, or the excision of 435 random 
points from the stack. In the first case, the result is that later 
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fetches are accounted to lower-numbered positions than they should have 
been, and in the second case, some later fetches are accounted to positions 
of higher number than they should have been. In either case, the total ef- 
fect is that of an uncertainty of e^ t jher - 4435 or -^435 on the stack posi- 
tion axis of any derived graph. In reality, the lost data must contain an 
almost equal number of pushes and fetches. C$y conservation of pages* the 
difference must be exactly the difference between pages created and des- 
troyed in core-drum.) As a result, a typical misplaced page in the ex- 
tension stack will suffer an average displacement gt 435/2* due 4 to the 
lost fetches. Hence, the total uncertainty in the stack position axis of 
any derived graph is na£ greater than plus or mi^j^,;on^half the number of 
lost data items. For "dtm 23", this is 2i©\ positions. Mpst of our graphs 
are plotted to a resolution of 500 stack post^ns.> ^ompared to the 
8000 or so positions of interest, this inaccuracy is not, very significant. 
Summarizing, the effect of lost trace data i*..see» both, as lost accu- 
racy in counting, and uncertainty in the stack-position axis of derived 
graphs. Both uncertainties are proportional to the amount of lost data. 
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3.4.2 Global Transparent Paging BeviC# tiufctrttactca 

"Biis type of inaccuracy results fro« the special handling of the sys- 
tem's top level directory pages, three Jfctiirifti ^ThesW'pagee axe ousted from 
the core-drum combination prematurely* as dictated by a system reliability 
policy which attempted to insure t*e integrity of these pages by keeping 
them off of the drum. As a result, ^ ' ia*ey"*wertf W falct use^-more often than 
the pages legitimately attbe top'la&e^^f^t^ 

had no right to be in this stack at .'all. 'that IS 1 td ss^ they would have 
been on the drum at almost all times hW they %o% ^Nfih so specl^tl^caseid; 

the anomalous effect of these pa|^fastf < he^see1res^ work 

of this thesis: An experiment deslgifigtf; tc^d^lbvlf l tfhe si^^flcsisce of 
this effect revealed that on some days, hetweeirt^-lieif and one-fif th of 
all Multics disk traf fie was a resitlt o¥ ! tl»sii^ w 'siEiic^US^s»ed'-ps^gee. 
Thus, our experiment was -sbdifieo 5 fio^hota^'ifcir traca ; dat«-wheu such 
pages were being fetched Or c^s ted from c o r e - u r u m. ■ the mechanism by which 
this information was tra^^ shown 

in Appendix A. u:„, *s.!^tj;f;it.;^;': 

The predominant inaccuracy caused by these pages is a distortion of 
the very low end of the f(x) and r(x) curves (see section 1.3). The stack- 
reordering inaccuracy created by these pages cannot be more than plus or 
minus three positions (as there are only three of these pages) at any 
point in time or stack, and is thus totally insignificant. As these 
pages rightfully belong on the drum, they are usually fetched very soon 
after they are ousted, and thus, never migrate very far down the stack. 

Thus, many reference counts at low- numb ered stack positions are at- 
tributable to these pages. If the core-drum combination were extended by 
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any finite amount, and managed as currently done (i.e., at the time of the 
experiment), the anomalous references would still appear outside of core- 
drum. One should thus consider these references to be to stack position 
"infinity", meaning that they would be disk references no matter how far 
core-drum were extended, or simply "highly anomalous", and not considering 
them at all. The latter course, which we have chosen here, is equivalent 
to ignoring the effect of these pages on the extension stack ordering, and 
considering them to be outside the domain of the extension stack, that is, 
in core-drum. The effect of removing references to such pages on MHBPF(n) 
is easily calculated. Starting from equation 3.4, we multiply both sides 
by MHBPF(n), and obtain 



MHBPF(E) = MHBPF(n> 



§ r(t) 
t = tri-1 

t ! «■!*<*> 



If E is greater than the deepest position in the extension stack to which 
any of the anomalous pages ever migrates, the only place in this equation 
where anomalous pages are counted is MHBPF(n). The effect of removing the 
anomalous fetches from this quantity is simply to scale it proportionately 
to the number of page fetches to be not considered. That is, if T page 
fetches (other than the "startup transient" fetches of section 2.2) were 
observed, A of them to the anomalous pages, 

MHBPF(n) ., . . MHBPF(E) . . T 
adjusted _ adjusted _ I - A 

MHBPF(n) , . ~ MHBPF(E) .. ~ T 
old oia 

This ratio was observed to be between .92 and .96 for the measure- 
ments "dtm 23" and "dtm 21" displayed here. 
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Summarizing, the anomalous drum- abhorring pages create an inaccuracy 
of about 4 to 8 percent in the headways and headway ratios calculated from 
the measured data. 



67 

3.4.3 Inaccuracies Resulting from List Deletions 

These inaccuracies result from a design decision to implement a simple, 
fast list-maintenance algorithm, as correct treatment of these deletions 
would require a fairly time-consuming technique. Hence, we proceed to ana- 
lyze the extent of the inaccuracies resulting from this inaccurate treat- 
ment of deletions. 

Recall from Chapter 2.2 that the deletion of pages in core-drum does 
not affect the ordering of the extension stack. Such a deletion implies 
that a fetch into core-drum will occur with no corresponding ousting. As 
this is in fact what happens, there is no inaccuracy involved with core- 
drum (or "out of list") deletions. 

The deletion of a page from the extension stack creates a "moving 
anomaly", as discussed in section 2.2. All references to pages in posi- 
tions in front of the anomaly (which occupies the position of the deleted 
page in the extension stack) are tallied correctly. The first reference 
to a page behind the anomaly is tallied incorrectly, because we chose not 
to record the anomaly. It is recorded as being one page closer to the top - 
of the extension stack than it should have been. However, the anomaly now 
moves down to the position of that fetch. The situation is now the same 
as had the page just referenced been deleted. References in front of that 
position are tallied correctly, and exactly one reference behind it is 
tallied incorrectly, and the anomaly moves down. 

We proceed to analyze the motion of such an anomaly down the extension 
stack. In the worst case, the page deleted was at the ver^ top of the ex- 
tension stack, and thus, the next reference is guaranteed" to be tallied 



incorrectly . Probabilistically, this reference will be to that extension 
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stack position which is the mean of the distribution f (x), the measured 
reference frequency distribution. There is a probability of q that this 
reference will be as far down the IKU stack that a fraction q of the 
weight of the distribution f (x) is below it. Now for the measurement "dtm 
23", there were about 2000 in-list deletions per 50,000 in- list reads. 
This means that there was one deletion per 25 reads. In order to calcu- 
late the probability that a deletion anomaly is past a certain depth in 
the extension stack by 25 reads, we consider the experiment, tried 25 times, 
of encountering a read at least that far down the stack. The probability 
of success of one read being in that portion of the stack where a fraction 
q of the weight of f (x) is left is exactly q. The probability of a 

,-S-itfl-; ,f; ( ";/f, ■-' ?■■-: • :- ' ■-.-■■■•■ 

25 
failure is (1-q) . The probability of 25 failures is (1-q) . The pro- 

-".■•'•'-■■: ■■= .■•-. ' ■ ■' .' ' '■■--. :?.i:: -:';-ii ':s;sv; ■; 3o ' '.-.-.'■■ 
bability of at least one success in 25 tries is one minus the probability 

25 -*■- J :' = * i "■■■ 
of exactly 25 failures, or (l-(l-q) ). For q = .1, i.e., the ninety per- 
centile point of f(x), it is .93. For q = .05, it is .72. Hence, by the 
time the next deletion is recorded, it is quite likely that the anomaly 
generated by the previous deletion is quite far down the extension stack. 
Hence, for the upper portion of the extension stack, the effect of dele- 
tions do not cumulate. Hence, each deletion generated an inaccuracy of 
one stack position for each read behind it, but the corresponding anomaly 
moves sufficiently rapidly down the extension stack that the effect of 
later deletions are independent. Thus, for the upper portion of the 
stack, the result of these deletions is a total uncertainty of one stack 
position, a negligible amount. 

The above reasoning correctly implies that the anomalies resulting 
from deletions accumulate at the lower reaches of the extension stack, 
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and that fetches from these regions have a large cumulative error. In 
order to analyze this effect, we construct a queueing theoretical model of 
list deletions. A deletion anomaly causes the first fetch to a position 
behind it to be off by one position. Several de let: ion anomalies in the 
extension stack cause the first fetch behind them to be off by that many 
positions (the number of deletion anomalies). However, once this first 
fetch has occured, the next fetch from this position will be off by one 
less position, and so on, until all of the deletion anomalies have moved 
behind the position in question, and fetches from thia position are tallied 
correctly. Thus, we may construct the following interpretation: The dele- 
tion anomalies in front of position p form a queue. ISsch -fetch behind 
position p •'services" one request, i.e., removes one item from the queue. 
The rate of arrivals to this queue, in the worst case (all deletions from 
the very top of the extension stack) is the rater of deletions. The rate 
of service is the rate of references to stack positions behind position p. 
The length of the queue is the number of outstanding anomalies in front of 
position p, which is the total error in stack position by which fetches 
from position p will be tallied. Assuming exponentially distributed arri- 
vals and services, with respective means X and /j, the average 
queue length at position p is known to be L = l/(l-X/fi) from queueing 
theory. X/jz, the ratio of arrival rate to service rate, is the total 
number of deletions (assuming the worst case) divided by the number of re- 
ferences past position p, both immediately obtainable from the measured 
data. As there were 2000 total deletions'^ queue length approaches infin- 
ity at the point in the extension stack where 2000 references were counted 
below that point. This point is at 3650 pages depth. At 500 positions 



*In measurement "dtm 23" . 
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before this, queue length is down to 4. At only 100 positions before it, 
queue length is 20. Hence, up to 3000 pages, the effect ia negligible. 
At positions below that point where there ere 2000v 1 page fetches recorded 
below, the queue grows faster than it is serviced. At the time the experi- 
ment is stopped, the number of deletion a nny p Hf s remaining jj queue in 
front of position p', where p 1 is beyond the 2000 reference point in the 
extension stack is the total number of deletions (in the worst case) minus 
the number of references beyond position p'. As both of these quantities 
presumably grow ajt a constant rate, the average ^rrcr in stack position of 
a recorded reference to position p' is onp-half this queue length. This 
allows us to reconstruct an approximation at the correct r(x) and f(x), 
and then all of the resulting curves, :by recreating a <J?e£ter,, more accu- .,.. 
rate r(x)» r'(x) as 

r' (x) = rOc> for x< 2000 pages f j 

r' (x + 200 °- t: < 3t ?) s r < x ) for * 2000 pages 

where t(x) = 2 r(y.J . .. 
y = x 

This implies that at the very tail of the distribution, there is a stack 
position inaccuracy of 2000/2 =1000 positions. At a stack depth of 5000 
positions, there is an inaccuracy of 500 positions. This does not seri- 
ously affect the shape of the exception ratio and IfflBPF curves in the 
region of Interest as one can see from figure 3.6. We have re-plotted 
here figure 3.1 and corrected as above. 
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3.4.4 Other Inaccuracies 

Another possible source of inaccuracy was the forced oustings of 
pages to disk directly from core, not dfce tfc global transparency to the 
paging device. A close inspection of l, tryJto_write_ p pag« ,J in Appendix A 
reveals that pages which should be ousted from core to the paging device 
(the drum) are occasionally ousted to disk r because there is no room on 
the paging device . This action avoids recursion 3» the process of finding 
a free core frame, as the latter process would otherwise possibly involve 
ousting pages from drum, which could require finding a free core frame. 
Although we do not have data on the frequency of this occurrence on the 
days of the experiment, we have observed Mult ics at other times, and the 
percentage Of disk writes caused by such oustings is less than a tenth of 
a percent^ef all disk writes. It is true of Unities that the ratio of 
reads to writes remains fairly constant. Each read corresponds to one 
page fetch, and each fetch must be accompanied by an ousting at some 
time./ Hence, 'forced* oustings must be a similarly small percentage of 
all oustings, and not a significant effect. 
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3.5 Correlation between dtm 21 and dtm 23 

Observing figure 3.2, the correlation between the two plots is fairly 
remarkable. Within any reasonable accuracy for what is meant to be used 
in engineering approximations, these curves represent a measurement of 
the same quantity. 
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3.6 Our Result and the Linear Mqdel Jteaguxqmegfeg J5 : 

Our mean, 'headway curve ^figures 3.1,3.3^ shows, distinct differences 
from the curves given in SI for the, mean headw ay function of the HiLtics 
system. We can attempt to rationalize these di ff e r e j y ee by understanding 
the nature of the user load during the two different experiments. 

The measured mean headway between disk page faults, in terras of vir- 
tual memory references per disk fault, was between two and three times the 
figure measured in SI. What is more, the slope of the two curves differs, 
ours starting out at almost six times the slope of the curve SI. We attri- 
bute this to differing values of X in equation >.13, in terras of the model 
proposed in this experiment. More specifically, the 'tightness' of 
working sets was greater for our experiment, and the number of distinct 
users was fewer, causing even greater tightness of the system's "combined 
working set" at any time. The measurements given in SI were made during 
a day of very heavy system usage, in August 1972. User load at this time 
consisted primarily of systems programmers engaged in program development, 
an activity which references vast extents of libraries, tools, and spe- 
cialized procedure and data. These users were also operating without 
economic restriction, and thus had little incentive to minimize the re- 
sources used by their activities. Our experiments were conducted at a 
time when some of the Hultics user load had shifted to the Honeywell 6180 
Multics system, in a state of development at that time. All of the sys- 
tems programmers had moved to the new machine at the time of our experi- 
ments, and the remaining user load was quite light, consisting of the 
M.I.T. academic community. The lightness of user load also implies a 
smaller number of distinct users. 
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The mean headway curves which can be extrapolated from our measure- 
ments may be viewed as a function of user load / system working set 
tightness, giving rise to the family of graphs of figure 3.7. From this, 
it can be seen that a linear region on a curve corresponding to a large 
system working set can correspond to a non- linear region on a graph cor- 
responding to a smaller system working set, and the latter will rise 
faster than the former. If one draws the line C C' corresponding to the 
core-drum to disk boundary, both the differences on measured headway and 
slope differences can be more readily understood. 

Another factor which gives rise to the family of graphs in figure 3.7 
is the transient response of the experiment. As the length of the LRU ex- 
tension stack grows, so does the observed value of X* Especially when 
user load is light (smaller number of disk references per hour), it takes 
longer to develop curves of low X than high X, and this was the case in 
our measurements. Hence, it is possible that a more extensive measurement 
could have allowed a curve of higher apparant X to result. 
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Figure 3.7 Family of Headway Curves of Differing X 
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Chapter 4 

Conclusions and Suggestions for Future Research 

4.1 Conclusion 

The most general and useful conslusion that can be drawn from this 

•■/ ■'-.. "':: : ' ; ... v : -,>--..' <k ->-.X ::'U- ':."■■ -:.-j..-- ";■ :.'-■' 

thesis is that increased primary memory size decreases page fault overhead 
in virtual memory systems very sharply as it grows, the decrease being 
much greater t3ian proportional *© the increase in memory size. 

We hypothesize that the reference patterns observed, and the headway 
functions derived are characteristic of a large-scale computer utility 
being used by an academic community through interactive consoles. The data 
being referenced on Multics was accessed ^tlHMMgbKia virtual memory mechanism: 
were it accessed via explicit disk requests en some other type of computer 
utility, we expect te see tne same' patter** stt* headway functions. 

The most specific and concrete result which we *ave arrived at is a 

measurement of the mean headway function for Multics, shewing how page 

8 
fault overhead decreases as primary memory size approaches 4 x 10 bits. 
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4.2 The Paging Model Suggested 

The mean headway function MHBPF(x), where x is primary memory size in 
pages, may be expressed *s a polynomial in x, 

2 3 

MHBPFCO - a 4d 1 x*a 2 x -KljX ...+... (1) 

Saltzer's measurements suggest that 

MHBPF(x) = Of (2> 

is an adequate characterization of the paging behavior of Multics in the 
rsnge 0-ac£l.3 x loljbd**.' Our experiment shows* £w tha particular mea- 
surements we made, that the quadratic term in £i) ?feeccimBs si«nif icaat at 
x = approximately l.OriO 8 bits . The tr^oos indicated by figures 3. 1 and 
3.4 suggest that higher terms bee erne s tga yf f ^cant as x is increased fur- 
ther. This would be consistent with :tbe^ohsejr^ti)ttt, made ahouni arbi- 
trary increase in primary memory sise ,oa-# iVtx&^gmm^,,*^*^ «*Ae 

k 
above. Beiady and Kuehner (B3) assert MBRF<x) to approximate a-x for 

•real life programs'. In neasuremeats made on MM M44/44X and System 
360/67 machines, they found k to take, values 'in t%f c irtei^ty. ; <»f 2'. u 
This model also can be described by the general representation of equa- 
tion (1) above. These observations also suggest, as does figure 3.2, 
that 

MHBPP(x) = (a Q -P) + pe p (3) 

is in some cases a valuable approximation. The constant term becomes in- 
significant for sufficiently large x, and we may write 

MHBPF(x) = pe p (4) 
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a very simple model which is very appealing. As was shown in Chapter 3, 
this model corresponds to a reference probability distribution 

/ . -x/X (5) 

p(x) = e K ' 

where x is IRV stack depth and p(x) is the probability of reference to 
that position. This simple model of program behavior is particularly ap- 
pealing, as it characterizes "program size" as a distribution. Denning 
(Dl) has given the concept of 'working set' as a measure of program size, 
within a given time interval. Equation (5) is or a more specific class 
of program characterizations, expressing the 'size' of the program as a 
distribution. The parameter X may be viewed as a 'radius of locality* of 
the programs running, expressing in some sense their 'tightness' or 'to- 
getherness'. In this sense, X is akin to the concept of working set. 
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4.3 Unanswered Questions and Future Directions 

The most obvious extension of the work presented here Is to extend 
upward the range of primary memory sizes for which the nature of MHBPF(x) 
is known. Although the techniques used in this thesis are completely ex- 
tensible in this regard, it is not clear whether or not there is any value 
in such research beyond some point. If a certain amount of primary memory 
reduces secondary memory accesses to once an hour, for instance, the issue 
of secondary memory access time versus cost quickly takes precedence over 
primary memory cost versus secondary memory reference overhead. For in- 
stance, our measurements predict t^t another sewn million words of core- 
drum would reduce disk references to once every two minutes. At this rate, 
the economic viability of a fast disk is a greater issue than the per- 
formance improvement resulting from more cor e-^artpu for instance, a large 

12 
(10 bit) slow (1 sec access time) store might be quite acceptable as a 

backing store. 

Another area of research is to fully understand the program behavior 
patterns which are responsible for models of program behavior such as 
Saltzer's linear model and the model proposed above. We understand the 
working set model because we know that program loops, subroutines, etc., 
cause repeated reference to certain data items, and this behavior is some- 
what extensible to larger views of programs. We do not know what "causes" 
the linear model, or other such models in this sense. We can understand 
"distribution" type models by the same considerations of 'spatial lo- 
cality' and 'temporal locality' 012) on which the working set model and 
the 1X0 replacement algorithm are based, but we have no insight into the 
basis for any particular distribution in program behavior. 
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A very major unknown limitation on the work of this thesis is the ap- 
plicability of its result. No attempt has been made to determine pre- 
cisely what aspects of Multics user behavior pere responsible for the ef- 
fects noticed. It is not even clear bow specific the resuJLts °* this the- 
sis are to the LRU algorithm. 

An interesting direction of research would be to perform a similar ex- 
periment on some LRU-managed system which does not utilize virtual memory. 
A large data management system utilizing an LSD-managed buffer pool might 
be such a system. The common aspects of user behavior accessing a large 
on-line data base might show through here, as it is the referencing pat- 
terns, not referencing methods which are interesting. 

Systems such as IBM's 0S/VS2 and VM/370, involving paged virtual 
memories and multiple address spaces provide a fruitful ground for compari- 
son. In these systems, features of sharing and data addressing are quite 
different than Multics, but the amount of data to be addressed and the pri- 
mary memory usage strategy are not altogether dissimilar. 

An important direction to be pursued is that of repeating this ex- 
periment on Multics, reliably, many times, and determine day-to-day and 
hour-to-hour variations in the behavior of MHBPF(n). The thrust of this 
thesis was to develop and apply the techniques stated herein, and others 
must use these tools to correlate MHBPF <n) to whatever factors appear 
as influential. 

An interesting issue is to relate the parameters X and a ± in all of 
these models of program behavior to other observable parameters of pro- 
gram behavior and system configuration. This would constitute a long step 
toward theoretical understanding of the behavior which underlies these 
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models . 

Sekino (S2) shows the significance of MHBPF(x) in system performance 
calculations, particularly throughput and response time. Although our re- 
sults may be used in these calculations, we have not pursued this course 
here. 
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Appendix A 



A Structured Program Description of Matties Page Control 



i'.i IV 



This appendix describes the functioning of the fault and interrupt 
driven mechanisms within the Multics virtual memory management algorithm as 
it existed in May, 1973, at the time of the experiment. Only the paths 
within the so-called 'Page Control' subsystem relevant to this thesis have 
been shown. This excludes some fairly complex mechanisms relating to error 
handling and the allocation of page tables. Within the paths shown here, 
however, this results in only a few small omissions. 

The aim of this appendix is to familiarize the reader with the inter- 
nal operation of page control to whatever depth is necessary for compre- 
hension of the rest of this thesis, particularly Chapter 2 and Appendix B. 
To this end, we have provided a description on several levels. 

The most detailed description of page control given here is an approxi- 
mately "structured" program, in which we have functionally modularized 
page control into 14 small routines. We have taken the liberty of creating 
a new language in which to write this program, which we explain within. 
We feel that this language conveys the general class of manipulation des- 
cribed herein with a maximum of clarity and succinctness. 

We have liberally renamed objects, substituting names which we feel 
are more mnemonic than the actual names used in Multics. We have also 
made minor modifications to control flow, and subroutinized routines which 
were not originally subroutines where we felt that clarity would be 
aided. In any case, the algorithm as given is essentially identical to 
the actual assembler-code algorithm at the time of the experiment, with 
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respect to state, sequencing, and side effects. 

The plus sign (+) in the left-hand margin denotes references to routines 
explained in detail within. 
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A Brief Overview 



Multics manages both core and drum (the latter known as the "paging 
device", or "pd" ) by approximations to the least-recently-used algorithm. 
Two lists, the core used list and the paging device used list are main- 
tained for this purpose, the top of each list designating the least recently 
used page (which is the best choice for replacement), and the bottom of 
each list designating the most recently used page (which is the worst 
choice for replacement) on the respective devices. How these lists are 
maintained can best be learned by reading the program that we have pro- 
vided. The core used list contains logical descriptions of core frames, in- 
cluding pointers to descriptions of logical pages and/or paging device re- 
cords when such entities may be associated with the core frame. Similarly, 
the paging device used list contains logical descriptions of paging device 
records, including pointers to descriptions of logical pages and core 
frames, when such entities may be associated with the paging device record. 

Multics tries to maintain copies of the most recently used P pages 
(where P is the size of the paging device, in records) of the storage 
system on the paging device. The most recently used C pages (where C is 
the size of core memory in page frames) are to be in core, as well. (It 
is assumed that C is less than P.) 

Thus, pages being ousted from core may be written to the paging de- 
vice, even if a good copy exists on disk. This fact should be kept 
strongly in mind when reading "try_to_writejpage". Except for the case 
where the paging device has no copy, pages which were identical to pages 
in secondary storage are never written out. Pages of zeros are never 
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written out, but their logical description i* s© modified that they are 
created in core when faulted on. 

The processor hardware maintains usage information about a logical 
page in a hardware descriptor. Specifically, the occurence of usage and/or 
modification is noted in the descriptor. 

A page fault is resolved by finding a page of core into which to bring 
the page, and bringing it in. Finding a page of core consists of reor- 
ganizing the core used list to reflect the latest usage information, and 
finding the least recently used page frame, and using it. Pages which 
have been marked as modified cannot be claimed in this way, but are writ- 
ten out. When the writing is complete, at some future time, the page will 
be in the same state as a page which has not been recently used or modi- 
fied, and will be claimed in the handling of some future page fault. Note 
that this 'writing 1 consists of initiating the physical operation, but not 
waiting for it to complete. It is at this writing time that secondary stb- 
age is allocated, and pages containing zeros are noted. It is at the time 
that zero pages are noted and that secondary storage is deallocated. 

At the beginning of page fault handling, housekeeping is performed 
on the paging device, which consists of trying to insure that at least 
ten records are either free or in the process of being freed. This is 
done by removing as many of the least recently used pages on die paging 
device as necessary. When a page is so moved, it is checked (via soft- 
ware-maintained switches) to see if it is identical to a copy on disk. 
If so, it may simply be deallocated from the paging device. If not, a 
sequence known as a read-write sequence (rws) mast be performed. This se- 
quence consists of allocating a page of core to be used as a buffer, 
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reading the page into it from the paging device, writing it to disk, and 
deallocating the paging device copy. The core buffer is then freed. 

A page fault which occurs on a page for which a read-write sequence 
is in progress causes an event known as an rws abort to occur. The freeing 
of the buffer page and the paging device page are inhibited, and the buf- 
fer page is used as the core copy of the page, and the fault is resolved. 
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An Bsplanatton of the Unruarp 
Us** fo Impress fris P^rfofrion 

The language which we have used to describe Page ; tiontrbi is a bas- 
tardization of 'PL/ 1, with new primitives for some basic operations (en- 
queue, masked procedures, etc.) and an Algol - f^llfce^f&rmalism for repre* 
seating relationships among structured entities. 

Underlined words are language keywords. Lower-case identifiers re- 
present names of subroutines, functions, or labels. Identifiers beginning 
with an upper-case character represent references to cells , which will be 
described below. Statement syntax is essentially the same as PL/I, but 
":=" is used for assignment, and "=" is used to test equality. There is 
no lexical nesting of procedure or begin blocks. 

A program consists of begin blocks, entered from the outside world 
in some unspecified way, procedures and functions , and declarations. 
declare (del ) declarations may appear anywhere, including outside of blocks, 
and are global in scope. They define the class and type of variables , 
and the types of Objects used by the program, local declarations appear 
within blocks, and define a local scope of variables, identical to that 
produced when a variable is used as a formal parameter in a procedure or 
function . 

The point of this language is to associate cells with values . The 
domain of values is the space of Objects . Objects are unique. Two cells 
have equal values if and only if their values are the same Object. 

There are three classes of Objects: primitive Objects, structured 
Objects, and set Objects. Within each class, there are different types 
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of Objects. Objects have no nones. 9mky priaitiya ^Objects can be referred 
to explicitly, i.e., other than by reference to a cell having the desired 
Object as a value* or a function returning the desired Object. 

Primitive Objects can be of three Hypes. &ia rfjrst is boolean . 
There are exactly two boolean Ob jects .; One can b#, referred ft© explicitly 
as true , the other false . The second is arifcfaafttic . There is --,a first* 
order infinity of these objects, which .are actually the integers. They 
can be referrred to explicitly as 75ft» Mr77i72U, ?2«a» ate*r The third is 
literal . They are simply arbitrary, primitive Objects, whose onjjj useful 
property is their uniqueness. They can -:.b* -,rs*&BCV9&» to explicitly as "f©o' 
"bar", "no stuff", etc. They are not character strings to, any sense, but 
ainmlv unique primitive Xfetects of type I^feefsal . 

Structured Objects consist of a finite ntaaber ; v «f calls. - vAnv. cell 
can have as a value only <ma type of Object j^mpUisd is one class, as 
well). These cells are called caaponantg of t^e Object. vJbeset cells do 
have names, and they are specified in a declaration which describes the 
concerned ty pe of structursed Object. : -, : .s 

Sett, Objects consist of an ordered, set of Objects- of the same type 
and class. -All- ■■refegaraaavaaaepfeten ayp ^ consider 

the set Object as unordered. One catt ada? f p. t or ajjiweug to a set Object, 
remove from or dequeue from it, ask if a ^^MMfcgNS* *%* aea&er of it, 
or cause a cell to be assigned succ^ajj^iyaj*iea» saach»yaj.ua being a dif- 
ferent Object in the sat Object, in ne? ?vmmfa»&m -oialer,- ; <, 

Variables are t*e e*he« ntypesffcf cejyL *> s Ay«BriaJM-%#a|i bold only one, 
class and type of Object, just like the- other r*yfie $g ,cejl^, the struc- 
tured Object component. 
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Assignment (performed by » : =» operation la do statements and assign- 
ment statements) consists of replacing the value of a cell with another 
value, i.e., changing the value of the cell. The Object vhich was the pre- 
vious value is neither changed nor destroyed in any way. 

Binding consists of saving the value of ■*> variable when a procedure , 
function, or begin block is entered, and restoring it when iris exited. 
The latter operation is called unbinding . Alls assignments and bindings 
made between the time a variable is bound and tdve cofcrespottding unbinding 
have a transparent effect when the block performing the binding is exited. 
A local declaration of a variable in a block causes such a binding to take 
place for that variable when the block is entered, and the corresponding 
unbinding. Binding also takes place fof valciabtew used a«i formal parameters 
to procedures and functions . In this case, after the old value is saved, 
the value of the corresponding formal argument is assigned to the variable. 
Hence, all calls may be seen as "call by value" . 

To refer to an Object, c«e can either refea: to a oell containing it, 
or, if it is primitive, one can refer to it explicitly^ To refer to a 
variable, simple state its name. To refer to a component of a structured 
Object, state its component name, mnjgj^i§1*m&to9&MFm- i xmtKtmaice to the 
structured Object, and a close parenthesis. 

An assignment is a reference to a celly ":«= 1 ", and a reference to an 
Object of the same type and class declared for that cell. 

Variables need not be declared. The default class of any cell is 
structured , with a type the same as- Its name. '-'-The ^syntax? for a structured 
Object type declaration is as follows: 
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i — j-t — } structured Foo (compdcl-1, compdcl-2,. . .compdcl-n); 

[ ] = optional { } = select one 
The compdcls, or component declarations, are of the same syntax as variable 
declarations, except that the name is the name of the component, and the 
optional keyword variable is illegal. 

The syntax for a variable or structured Object component declaration 
is as follows: 

{ de ^" e } [ variable ] Foo [type] [obi typ] 

where objtyp is either boolean , literal , arithmetic , any structured Object 
type named in a structured Object type declaration, or set objtyp, where 
objtyp is, recursively enough, any possibility named in this sentence. 

local declarations only name their variable, although they can declare 
its type as well. 

do statements differ from PL/I in that any cell can be used to the 
left of the ":=", not necessarily variables. The 1 particular form " do 
Foo: = range Bar" means that the value of Bar is a %jst object, and the 
do is to iterate over each Object therein, in no special order. 

The special constructor function construct is used to create new 
structured objects. The syntax of a reference to it is 

construct Foo (compname-l:object-l,compname-2:object-2. . .), 
whose value is the new Object. 

The unique Object "null" can be used as a value of any cell. It has 
all types and classes. 

The predicate void takes as an argument a reference to a set Object, 
and returns true or false (boolean Objects), depending on whether or not 
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it is empty. The operators "=" and 'M* may be used to test if two refer- 
ences are equal, i.e., refer to the same Object. An appropriate boolean 
Object is returned as a value. The operators "or" , " and" , and " not" 
operate on boolean Objects in the obvious way. The conventional arithmetic 
operators operate upon arithmetic Objects, returning an arithmetic Object 
with the expected value. 

if statements have as their predicate a reference to boolean Object. 

A call statement consists of the word call followed by either a pro- 
cedure name and an optional argument ££•£ wP a complex function reference 
and an argument list- An argument list is a parenthesized list of (pos- 
sibly zero) references to Objects separated by commas. A complex function 
reference is a function reference to some outside-of -the- language function 
which will return as a value a procedure , which one depends on the argu- 
ments to the function, which will be called by the call statement, with 
the arguments to the call. 

The evaluation of arguments in .%,.m^ jS flfl is conditional, as in 
Lisp 1.5 (M3) and proceeds from left to right. 
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A Program to Find the Man Baa Owns the BAaek House , 
and Rave Him and His Father Switch Houses 

declare structured Person 

(Father type Person, 

House); 
declare structured House 

(Color literal , 
Owner type Person); 
declare Son Person, House2 House; 
declare Brooklyn set House; /^assumed to be Initialized*/ 



switch houses t begin ; 

do House := range Brooklyn; 

if Color (House) = "black" then do; 

House2 := House (Father (Owner (House))); 
Son := OwnerCHouse); 
House (Son) := House2; 
Owner (Hou8e2):= Son; 
House (Father (Son) ) := House; 
Owner (House := Father (Son); 
return ; 
^nd; 
£S<[; 
end; 



//search the set "Brook- 
// lyn" 
//found him 

//find the other house 

'f/r^Baber wnb Is the 
// sW ■-'- 
//Son now owns :House2 



//Father owns house 



sjgt«»%S*iiii»3^»jggjl3jagwjf^^ 



94 
A Top-Level Proaramatlc Viae of Para Qm^o! Acuity 

A page fault causes the following : (page_fault) 

The paging device is housekept. 
Transient conditions such as i/o in progress or an rws on 

the faulted page are noticed and handled. 
A free page is claimed, and the faulted page is read or created • 

into it. ^ 

If i/o was started, the page is waited for. 

Finding a free page consists of the following ; (€i«d;jc<»je# 
The core used list is searched for - a gofod candid a t e. - ; 
Recently used pages are not good candidates. They are skipped, and 

re-judged as not-so-recently used for next timer 
Pages which have been modified (stored into) cannot be claimed now. 

They are written out, and re- judged as not to have been modified. 
A page which has not been modified, and has bees used approximately 

less recently than any other page, is "pre-empted ""rrW its core 

frame, and this core frame iiti-tBfcnW^fr4e 1 'page: 5 irlmie. 

Writing a Page out consists of the following : (writejpage) 
The page's contents are checked, and if alt zeros, the page is 

flagged as not needing to be read or written - Ho writing takes 

place, and disk and paging device space allocated £o the page 

are freed. 
The page is given a residence on disk, if it does not already have 

one. ' ■''"■ "' ! -'; ; ■ ■■'■""■■' '■' ;r ~* ••" 

The page is given a residence on the paging device, if it does not 

already have one, and one is available. 
The page is written out to its residence on the paging device , if it 

has one, otherwise to disk. The completion of i/o is not waited 

for. 



95 

Housekeeping the paging; device consists of the following : (get_free_pd_record) 

An attempt is made to insure that there are ten paging device re- 
cords free or being freed, which is done as follows: 

The pd used list is searched for a good candidate to pre-empt. 

The search is made starting at the least-recent ly used pd record. 

Records which contain pages in core are recently used. They are re- 
judged as such and skipped. 

Records containing pages identical to pages on disk are acceptable. 
The pages in them are pre-empted, and the record is now free. 

Other records have to be written back to disk, which is done by 
performing a read-write sequence (rws) on them. 

Performing a read- write sequence on a page consists of the following : 

(star t_rws , rwsjdone ) 
A free page of core is obtained. 
The page is read into it from the paging device. 
When the read is completed, the page is written out to the disk. 
When the write is completed, the page of core and the paging device 

record are freed. 
A page fault on the page involved in the sequence at any point 

during it cawses the sequence to be aborted at the next complete 

operation in the sequence, and the core page is used as the 

page's home in core. 



ft ~t 



^S^^0gS0^^^0^^. 



^^^^Sf^^^S^J^^^Sg^^^^-^ 
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A Top-level Description of the Ob tecta Usad br Page Control 



A Page Object 



is the logical description of some page of the 
storage system, as opposed to a page frame on 
some device. 



A Descriptor Object, 



in actuality a "pagfc table word*', is the physical 
descriptor by which a process** accesses a page. 
It contains a ©e*e address, tt#age bits, and a bit 
which causes a ''fiUft^tAan Hsff^ 



A Coreadd Object 



describes a physical core block. It describes 

the ■■a&PbamZb&JSb&f®*^ 

its position in the core used list. 



A PDrec Object 



describes a paging device --record* ov frame. It 
describes Che status fc ^ £ ^i»' ; frame#^inelW»igf im- 
plicitly, its positi<m in the paging device used 

list. .%->-,:■> ■ ^v n - 



A Devadd Object 



represents a physical disk or drum address, and 
its contents. Included in this' object is an iden- 
tification of the device eW which this page frame 
resides . 



to Io-status Object 



is a hardware- generated object, which describes an 
input-output operation which has completed. 



An Io-program Object 



is a sequence of commands for the system i/o con- 
troller to give to an i/o device. It specifies 
the type of operation required, the record within 
the device concerned, and a core address con- 
cerned. 
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A Trace-Datum Object is a recorded datum of information about traffic 

between disk and core-drum, for the purpose of 
the thesis experiment. 



Sg!Jjgi§S**a£^^ "*^ 
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The Global Variables Used by Page Control 

del 

Page_table_lock literal , A quantity used to insure that only one pro- 
cessor at a time is in page control. A pro- 
cess desiring to "hold" this lock loops con- 
tinuously until it is unlocked, and then 
locks it. 

CoreTop type Coreadd, The least rec«Bfciy used Coraadd Object. 

Writes_outstanding arithmetic , 

The number of write operations started which 

have not yet been known to complete. Used 

as a heuristic to call post_any_io. 

Rws_active_count arithmetic , ^ nm ^ er ^ ^j.^^ 8equence s which have 

been initiated and not yet known to be com- 
pleted. 



Number_of__free_pd_records arithmetic , 

The number of paging device records free or 

in the process (s«s) of being freed. 



Top of pd used list type Pdrec. 

The least recently used PDrec Object. 



Channel_Queue type Io_program, 

The executable queue of i/o programs for a 

disk or drum. 



Experiment_active boolean , Tells if meterittg experiment is in progress. 

Trace_queue set Trace_datum, ^ tot&1 Qf all trace data accumulated by the 

experiment . 
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Undocumented Routines Referenced In this Program 



page_wait (literal ) 



Suspends the calling process until a call to 
notify is made with the identical literal, 
pagejwalt ala'O unlocks td^ygi^jtable lock once 
l*e traffic control data bases are locked. 



notify (literal ) 



Causes any process which called pagejwait 
with the identical literal to be resumed. 



c lear_associative_memory 



Causes alt processors to claiar their associa- 
tive memories. This routine does not return 



until all processor Jtgtj&^MJBaM**^ *«£ 
have done so. Used to force access turnoffs 
and modified bit turnoffs to take effect. 



allocate_disk_record () 



Returns an unallocated Devadd Object. Marks 

-it as -allowgted.^r "' "---—"— : :"-- J - " 



relinquish_disk_space(Phys_Devadd) 

Marks a Devadd Object as unallocated to alio- 

catejdiskjce^ord,. 



start_io (Io_prflgram> 



Starts a channel executing an i/o program. 



thread_to_top (Coreadd) 



Changes core used list ss|ii value ofCoreTop 
such that Coreadd is moved to the top of the 
Core used list (least recently used). 



i*5>: 



thread_to_bo ttom (Coreadd ) 



Changes core used list and value of CoreTop 
such ti|fjt Corffdd^is agggfcSg ^he bottom of 
the core used list (most recently used). 
Next (Coreadd) now = CoreTpp. 



mi 



%-- 
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Appendix B 

Implementation of the Hardcore Meters 

In this appendix , we relate the exact identities of the measured 
events of Multics page control with which this experiment was concerned. 
This is necessary both to provide validity for what we have done , and to 
help others design similar techniques for other systems. It is assumed 
that the first appendix has been at least partially understood, perhaps 
with the overview sections fully understood. ' 

We also discuss here the techniques used in implementing the Multics 
Supervisor interface for this experiment. c " 

As should be clear from Chapter 2, we are interested in metering 
movement of pages in and out of the composite entity of core-drum. This 
"movement" in fact consists of copy creation and copy destruction. Move- 
ment "into" core drum consists of the creation of a page copy in core- 
drum where there previously was none, and movement "out of" core drum con- 
sists of the destruction of core or drum copies of a page, such that there 
is no copy in core-drum. We speak of this creation and deletion as move- 
ment because it is represented as movement of pages in an LRU stack. 

We will now analyze the different types of motion in and out of 
core-drum. Pages come into core-drum either from the outside, i.e., disk, 
or by being created in core. Pages entering from disk can only do so as 
the result of a page fault to disk or a pre-paging from disk, so a call to 
"meterjiisk" (see Appendix A) was installed in the^ i/o dispatching 
routine to record all reads from disk. Pages created in core never 
involve input/output. For the most part, these are pages which were 
never touched before, and would thus cause a page fault no matter how 






^'.■^^^^^S^'^-W^P^i 
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large core-drum were. These page faults,' however, involve neither 
multiprogramming, i/o, nor idle time^ and are thus of slightly leas 
interest in performance predictions than i?pre general p^ge faults r We ^ 
chose to ignore them. There is one other type of inward motion, which 
will be motivated in our discussion of outward motion. 

Outward motion consists of ousting* from core -drum. This consists 
of oustings from drum (which, as can be verified; fij^£^t^£?* i j4^*WT& n 
in the last appendix can only happen if t^hex^ i-fi ^,j^pjpy ^ coxe) 01 from 
core. Oustings from core are, only qustings from co^e-drum if P«ge con- 
trol (specifically, "try_to_writejpa;ge M ) decides ^that it should not.be 
written to the drum because of either leek of space there, qr r the. concerned 
page is one of the special-cased "gtpd^V pajges. ^e will first consider 
the oustings from drum. The ousting of a page which is^ different than 
its disk copy, if one was ever made, is accomplished by the initiation of 
a read-write sequence (rws). These rws initiations were thus metered as 
outward movement. The ousting of a page which is identical to a disk 
copy is done by simply claiming the drum frame (see "get_freejpd — record"), 
and this event was likewise noted. Oustings from core intexest us when- 
ever they are not oustings to the drum. (lie define an ousting "from core 
to the drum" to be an ousting from core when a copy of the concerned page 
is on drum. Note that this implies. an ordering of the, hierarchical 
memory system.) These oustings from core normally happen only for the 
special "Global transparent paging device (gtpd)" pages of the root direc- 
tory, whose treatment was already fully covered,, and in bad cases of page 
faults or rws initiations, when there are no free drum frames available. 
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This case was also covered. As an interesting consequence of this defini- 
tion of an ousting not to the drum, we observed a large number (approxi- 
mately 2400 per hour) of oustings of pages which were all zeros, and were 
all zeros when brought into core (the conjunction of these statements es- 
sentially implies that these pages had no copies on either drum or disk).* 
Some special experiments designed to discover the source of this peculiar 
traffic were essentially fruitless. The data reduction programs described 
in section 2.3 were modified to ignore these anomalous oustings. 

One consequence of metering read-write sequence initiation as out- 
ward motion is that the aborting, or reversal due to a page fault, of a 
read-write sequence must be metered as inward motion. This was done (see 



,! rws_aborc" ) . 



One remaining event which had to be metered was that of page destruc- 
tion. The event we chose to represent this destruction was the handing 
back of the disk frame, if one existed, to the free disk pool, of any disk 
frame at all. This happens in two cases. First, explicit page destruc- 
tion via the deletion of segments of the virtual memory requested by super- 
visor call, or their explicitly requested truncation causes this to hap- 
pen. Secondly, as we have described, findjcore deallocates both disk and 
drum frames when a page containing all zeros (a void page) is found with its 
usedbit off. As described in section 2.2, we are interested only in the 
destruction of pages which are not in core-drum. The destruction of any 
such non-void page will always involve the deallocation of a disk frame, 
and thus will be properly metered. The destruction of void pages is not 

*Even though this constituted about one quarter of all core-drum oustings, 
they bear absolutely no significance to the experiment. 
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a significant event, as they do not occupy a place in either the LRU or 
LRU extension stacks of section 2.1. The discovery of a newly void page 
by find_core also causes such an event to be recorded in the trace data 
as a page deletion. However, this page cannot be in the extension stack, 
because it was found by find_core because it was, in fact, in core. The 
data reduction programs were aware of these out-of-list deletions, and 
duly ignored them. The destruction of pages in core-drum which were never 
ousted is handled and ignored by this same mechanism. 
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Interface Details 

The Multics hardcore interface for this experiment was designed to be 
a. semi-permanent part of the Multics system, and thus have as little ef- 
fect as possible on it when not it use. Thus, a page of the virtual memory 
was allocated for the circular buffer described in section 2.3 and its 
auxiliary data. When the experiment was enabled, via highly privileged 
supervisor primitive, this page was given a dedicated page frame and with- 
drawn from the pool of pageable core. This was necessary to insure that 
page control, when storing data in this buffer, would not take a page 
fault. Page control was also made to check a switch (the "enabled/dis- 
abled" switch) as to whether or not this had been done before attempting 
to reference the buffer. Another highly privileged supervisor primitive 
freed the page frame given to this buffer, resetting this switch before 
doing so. 

The copying of data out of this buffer, via privileged supervisor 
entry point, ostensibly requires simply copying its contents into a user- 
specified area. However, it was an aim of the interface design to insure 
that the buffer would not change while the information was being copied. 
This could happen by either the processor not doing the copying taking a 
page fault, or the processor doing the copying taking a page fault refer- 
encing the user's area. Hence, to insure that no page control activity 
took place while this data was being copied, the data-gathering primitive 
had to lock the "page table lock" while doing this copying. This, in 
essence, prevents page faults from being processed, and cannot be done 
until any other process has unlocked this lock. The effect of this lock 
is to insure that only one processor is in page control at a time. When 
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one has the page table lock locked, one must not take a page fault, or infi- 
nite looping will result when that processor tries to lock the page table 
lock to process it. What is more, the page fault handler is not recur- 
sive. Thus, it was necessary to "wire" a dedicated page of the virtual 
memory (allocate a dedicated core page frame and withdraw the latter from 
the pool of pageable core) to copy the wired buffer into. Both pages 
being wired (the buffer and the temporary copy page) ensured that no page 
fault would take place during the copy. The contents of the copy page 
could then be copied to the user-specified area after die page table lock 
had been unlocked. 

A further difficulty arose because the segment containing the tem- 
porary copy page is a one-per-system segment, and thus could not be used 
by two processes simultaneously. Thus, a lock had to be used to exclude 
such use of this segment. This lock would be locked by any process wanting 
to gather data before it wired the temporary copy page, and unlocked after 
it had been unwired. A process or finding the lock locked would be multi- 

■' '- •' : - ■■■ ". ■ .;.. ''v. c,- ■■■■ I-. :..' a cf»~itflj :>7 rid? ansae- . ;i-:>:.- ; ,'■:' • 

— programmed, and the associated process notified when the lock was unlocked. 
The code which copies the wired buffer into the temporarily wired 
temporary copy page is entered only when the latter has been wired. How- 
ever, it is possible that the former may not be wired, specifically, if 
the experiment has not been enabled. If this is the case, a fatal page 
fault with the page table lock locked would result. To avoid this, the 
enabled/disabled switch must be checked by this code, but it cannot check 
this switch until the page table lock is actually locked. Only when it is 
locked can no page possibly be made unreferenceable, as no other process 
can be in page control. The enabled/disabled switch is turned to disabled 
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BEFORE the buffer is unwired, and the buffer can only be made unreference- 
able AFTER the page table lock is locked AFTER 1 it has been wired . Thus, 
the sequence of events in a call to gather is as follows: 

1. Attempt to lock the copy page lock; multiprograa and retry if 
failure* 

2. Wire the temporary copy page. 

3. Attempt to lock the page table lock; loop until successful. 

4 . Inspect t&e enabled/disabled switch; copy the wired buffer into 
the copy page if enabled, else ^op^r zer^s, H 

5 . Unlock the page table lock. 

6. Copy the temporary copy page out to the user- spec! fled area. 

7. Zero the temporary copy page, and unwire it. 

8. Unlock the copy page lock; notify any waiting processes. 

The step of zeroing the copy page is done so that this page will be imme- 
diatedly claimable to find_core. This is done as both a friendly gesture 
and an attempt to keep this page off of the drum and out of the disk traf- 
fic visible to the experiment. The page frame is always void when unwired 
(returned to the pool of pageable core). 

The sequence for enabling the experiment is as follows: 

1. Wire the buffer page. 

2. Set the enabled/disabled switch to enabled. 

The sequence for disabling the experiment is as follows: 

1. Set the enabled/disabled switch to disabled. 

2. Unwire the buffer page. 

The only remaining question of locking is that of the buffer be- 
coming unwired as page control is placing data in it. This cannot happen. 
Any page control operation sequence other than those just described can 
be summarized as: 
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1„ Attempt to lock the page table lock; loop until successful- 

2. Do all nature of page control. 

3. Conditionally, unlock the page table lock and end this sequence,, 

4. Check the enabled/disabled switch; add data to the buffer if and 
only if it is enabled,, 

5. Go back to step 2„ 

The making unreferenceable of pages by find_core falls under step 2 
above. During the checking of the switch and the placing of data in the 
buffer, this making unreferenceable cannot happen on this processor. The 
page table lock excludes any other processor, and there is no problem,, 
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Appendix C 

System Performance Graphs during Experiments 

■ ■■ " v 

We present here graphs of user load, cpu utilization, and paging over- 
head as functions of time, of day on the days of tfye two experiments, dtm 
21" and "dtm 23". fhS-s data was condensed from a graphical presentation 
of these parameters routinely prepared by the H,1nTi Information Processing 
Center. It is give* here to provide * feeling f«* %ha5tfelatflve user load 
during the experiment!, and to allow a rough approximation tojtoal system 
headway during the experiment to be computed, this may be computed by 
multiplying the time jof'ihe experiment (rough-ly 14 hours, qj" 50,000 seconds) 
by the fraction of the system which was not idle t|i*e or paging overhead 
(quite roughly, 40%), obtaining 20,000 seconds, and multiplying by the 

system memory reference rate (400,000 reference^per second % obtaining 

9 ' i ' ■ "'"' 

8 x 10 virtual memory references. 
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